-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NA behavior in prop.test #748
Comments
I don't think there is a single best way to handle missing data. In this particular example, missing is the dominant category: tally(~ anysub, data = HELPrct)
## anysub
## no yes <NA>
## 56 190 207 So it is good that we don't simply proceed to run the test. This function doesn't have an I'll have to look to see if it is easy to give a more informative error message. One of the tricky things with |
Sure. I was just trying to make a reprex here, my real data has only one This is my first time teaching really intro R labs in a long while, and I'm trying to avoid talking too much about data wrangling. Most of our datasets are pretty clean, but they do tend to have some I'm just wondering what people generally do about |
My point is that we shouldn't automate throwing away missing values -- certainly not silently. Since an analysis can involve multiple R functions each using different sets of variables, I think dealing with NAs is best done outside of a particular function as step before summarizing and modeling begins. Do you have a proposal for a function that behaves differently from |
I think @nicholasjhorton knows this, but I'm not sure if you do, @rpruim -- this semester I'm teaching two R labs, one in "formula" syntax ( I think
even at this stage, I am struggling with what to do in my formula labs. I ended up showing two approaches:
and
My understanding is that having a global option about Maybe what I'm requesting is for Wrapping back to
and get an unhelpful error message. Hopefully, then I'd realize it was about NAs. Then I would do data processing, and run it again,
? (I guess I could do a one-liner, Anyway, that's just a bunch of thoughts, but maybe useful. |
Current working version: prop.test(anysub~sex, data = HELPrct)
## Error: anysub has 3 levels (including NA). Only 2 are allowed. Now thinking about implementing an |
@AmeliaMN : How does this look? Note: na.rm can be a vector of dimensions from which to drop NAs or TRUE (all dimensions) or FALSE (none). Remaining NAs are treated as a category and a warning is emitted identifying the variable(s) in question. library(mosaic)
prop.test(anysub ~ link, data = HELPrct)
#> Error: anysub has 3 levels (including NA). Only 2 are allowed.
prop.test(anysub ~ link, data = HELPrct, na.rm = TRUE)
#>
#> 2-sample test for equality of proportions with continuity correction
#>
#> data: tally(anysub ~ link)
#> X-squared = 9.2749, df = 1, p-value = 0.002323
#> alternative hypothesis: two.sided
#> 95 percent confidence interval:
#> -0.29428286 -0.05895097
#> sample estimates:
#> prop 1 prop 2
#> 0.1567164 0.3333333
prop.test(link ~ anysub, data = HELPrct)
#> Error: link has 3 levels (including NA). Only 2 are allowed.
prop.test(link ~ anysub, data = HELPrct, na.rm = 1)
#> Warning: NA is being treated as a category for anysub
#>
#> 3-sample test for equality of proportions without continuity
#> correction
#>
#> data: tally(link ~ anysub)
#> X-squared = 19.25, df = 2, p-value = 6.607e-05
#> alternative hypothesis: two.sided
#> sample estimates:
#> prop 1 prop 2 prop 3
#> 0.3750000 0.6174863 0.6979167
prop.test(link ~ anysub, data = HELPrct, na.rm = TRUE)
#>
#> 2-sample test for equality of proportions with continuity correction
#>
#> data: tally(link ~ anysub)
#> X-squared = 9.2749, df = 1, p-value = 0.002323
#> alternative hypothesis: two.sided
#> 95 percent confidence interval:
#> -0.3991840 -0.0857887
#> sample estimates:
#> prop 1 prop 2
#> 0.3750000 0.6174863 Created on 2020-02-29 by the reprex package (v0.3.0) |
I hope to send this to CRAN early next week. |
This looks amazing! Thank you for addressing this. |
Just confirming that this went to CRAN and closing the issue. |
I was having trouble interpreting the error message
but this old issue made it clear that it occurs when one or more of your variables has more than two categories. It turns out that my data has more then two categories because it is a factor with two levels... plus
NA
. Here's a reprex:What is the recommended way to deal with this in
mosaic
?The text was updated successfully, but these errors were encountered: