-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wondering about multiple Dependent variables #746
Comments
I'll have to give some thought to this. In principle, it should be possible to do, we just need to process the formula differently, loop over LHS variables, and decorate the output so it is clear what's what. It might be easier to implement in |
Proof of concept: df_stats(Sepal.Length + Sepal.Width ~ Species, data = iris)
## _target_ Species min Q1 median Q3 max mean sd n missing
## 1 Sepal.Length setosa 4.3 4.800 5.0 5.200 5.8 5.006 0.3524897 50 0
## 2 Sepal.Length versicolor 4.9 5.600 5.9 6.300 7.0 5.936 0.5161711 50 0
## 3 Sepal.Length virginica 4.9 6.225 6.5 6.900 7.9 6.588 0.6358796 50 0
## 4 Sepal.Width setosa 2.3 3.200 3.4 3.675 4.4 3.428 0.3790644 50 0
## 5 Sepal.Width versicolor 2.0 2.525 2.8 3.000 3.4 2.770 0.3137983 50 0
## 6 Sepal.Width virginica 2.2 2.800 3.0 3.175 3.8 2.974 0.3224966 50 0 |
To do list
Some options for the last item:
|
Regarding doing this for
I'm inclined to do this for |
@nicholasjhorton : Any thoughts about naming? We want to avoid using a name that might be among the names of the variables in the data set. Using underscore makes things harder to use downstream, however. Perhaps we could use |
When processing multiple response expressions, Some examples: ## df_stats(Sepal.Width ~ Species, data = iris, mean, sd, long_names = FALSE)
## ## response Species mean sd
## ## 1 Sepal.Width setosa 3.428 0.3790644
## ## 2 Sepal.Width versicolor 2.770 0.3137983
## ## 3 Sepal.Width virginica 2.974 0.3224966
df_stats(Sepal.Width ~ Species, data = iris, mean, sd)
## response Species mean_Sepal.Width sd_Sepal.Width
## 1 Sepal.Width setosa 3.428 0.3790644
## 2 Sepal.Width versicolor 2.770 0.3137983
## 3 Sepal.Width virginica 2.974 0.3224966
df_stats(Sepal.Width ~ Species, data = iris, mean, sd, long_names = FALSE)
## response Species mean sd
## 1 Sepal.Width setosa 3.428 0.3790644
## 2 Sepal.Width versicolor 2.770 0.3137983
## 3 Sepal.Width virginica 2.974 0.3224966
df_stats(Sepal.Length + Sepal.Width ~ Species, data = iris, mean, sd)
## response Species mean sd
## 1 Sepal.Length setosa 5.006 0.3524897
## 2 Sepal.Length versicolor 5.936 0.5161711
## 3 Sepal.Length virginica 6.588 0.6358796
## 4 Sepal.Width setosa 3.428 0.3790644
## 5 Sepal.Width versicolor 2.770 0.3137983
## 6 Sepal.Width virginica 2.974 0.3224966
# long_names = TRUE is ignored in this situation
df_stats(Sepal.Length + Sepal.Width ~ Species, data = iris, mean, sd, long_names = TRUE)
## response Species mean sd
## 1 Sepal.Length setosa 5.006 0.3524897
## 2 Sepal.Length versicolor 5.936 0.5161711
## 3 Sepal.Length virginica 6.588 0.6358796
## 4 Sepal.Width setosa 3.428 0.3790644
## 5 Sepal.Width versicolor 2.770 0.3137983
## 6 Sepal.Width virginica 2.974 0.3224966 |
Updated to do list
|
Additional item: Need to consider what |
Here's POC for the change: df_stats(~ Sepal.Length + Sepal.Width, data = iris)
## response min Q1 median Q3 max mean sd n missing
## 1 Sepal.Length 4.3 5.1 5.8 6.4 7.9 5.843333 0.8280661 150 0
## 2 Sepal.Width 2.0 2.8 3.0 3.3 4.4 3.057333 0.4358663 150 0
df_stats(~ Sepal.Length + Sepal.Width | Species, data = iris)
## response Species min Q1 median Q3 max mean sd n missing
## 1 Sepal.Length setosa 4.3 4.800 5.0 5.200 5.8 5.006 0.3524897 50 0
## 2 Sepal.Length versicolor 4.9 5.600 5.9 6.300 7.0 5.936 0.5161711 50 0
## 3 Sepal.Length virginica 4.9 6.225 6.5 6.900 7.9 6.588 0.6358796 50 0
## 4 Sepal.Width setosa 2.3 3.200 3.4 3.675 4.4 3.428 0.3790644 50 0
## 5 Sepal.Width versicolor 2.0 2.525 2.8 3.000 3.4 2.770 0.3137983 50 0
## 6 Sepal.Width virginica 2.2 2.800 3.0 3.175 3.8 2.974 0.3224966 50 0 |
Would
have the equivalent:
And, would
have the equivalent:
(thinking of equivalency with |
Yes. Basically df_stats(Sepal.Length + Sepal.Width ~ Species, data = iris)
## response Species min Q1 median Q3 max mean sd n missing
## 1 Sepal.Length setosa 4.3 4.800 5.0 5.200 5.8 5.006 0.3524897 50 0
## 2 Sepal.Length versicolor 4.9 5.600 5.9 6.300 7.0 5.936 0.5161711 50 0
## 3 Sepal.Length virginica 4.9 6.225 6.5 6.900 7.9 6.588 0.6358796 50 0
## 4 Sepal.Width setosa 2.3 3.200 3.4 3.675 4.4 3.428 0.3790644 50 0
## 5 Sepal.Width versicolor 2.0 2.525 2.8 3.000 3.4 2.770 0.3137983 50 0
## 6 Sepal.Width virginica 2.2 2.800 3.0 3.175 3.8 2.974 0.3224966 50 0 I'll need to do a bit more testing to make sure I didn't break anything, but this seems to be working as I intended. |
@MichaelJMahometa, If you want to try it out: devtools::install_github("ProjectMOSAIC/mosaicCore", ref = "beta") |
I'd recommend against starting a name with underscore since, as you know, it requires back-ticks in many settings. Also, I'm against having the names of the output columns (as opposed to their values) differ depending on the names of variables in the input data frame. I don't think there's any real need, since "response" will be duplicated in the output only if the user creates such a name in the ... of the call to Why "response" and not "variable" or "name" or "variable_name"? Do you want to allow a formula like |
naming the response variable columnI'm not sure what the best name is. I just modified the "backup name" to be long vs short names for summariesSounds like your vote is for expanding .I thought about handling Currently One wrinkle if we allow |
Since it occurred to both of us, I decided to try implementing support for Example: df_stats(. ~ Species, data = iris, mean, sd)
## response Species mean sd
## 1 Sepal.Length setosa 5.006 0.3524897
## 2 Sepal.Length versicolor 5.936 0.5161711
## 3 Sepal.Length virginica 6.588 0.6358796
## 4 Sepal.Width setosa 3.428 0.3790644
## 5 Sepal.Width versicolor 2.770 0.3137983
## 6 Sepal.Width virginica 2.974 0.3224966
## 7 Petal.Length setosa 1.462 0.1736640
## 8 Petal.Length versicolor 4.260 0.4699110
## 9 Petal.Length virginica 5.552 0.5518947
## 10 Petal.Width setosa 0.246 0.1053856
## 11 Petal.Width versicolor 1.326 0.1977527
## 12 Petal.Width virginica 2.026 0.2746501 |
I really like the . addition: nicely done!
… On Sep 28, 2019, at 9:45 PM, Randall Pruim ***@***.***> wrote:
Since it occurred to both of us, I decided to try implementing support for . ~ rhs. This can be abused with less than desirable results, but I guess there legitimate use cases.
Example:
df_stats(. ~ Species, data = iris, mean, sd
)
## response Species mean sd
## 1 Sepal.Length setosa 5.006 0.3524897
## 2 Sepal.Length versicolor 5.936 0.5161711
## 3 Sepal.Length virginica 6.588 0.6358796
## 4 Sepal.Width setosa 3.428 0.3790644
## 5 Sepal.Width versicolor 2.770 0.3137983
## 6 Sepal.Width virginica 2.974 0.3224966
## 7 Petal.Length setosa 1.462 0.1736640
## 8 Petal.Length versicolor 4.260 0.4699110
## 9 Petal.Length virginica 5.552 0.5518947
## 10 Petal.Width setosa 0.246 0.1053856
## 11 Petal.Width versicolor 1.326 0.1977527
## 12 Petal.Width virginica 2.026 0.2746501
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I like this proposal.
… On Sep 28, 2019, at 8:56 PM, Randall Pruim ***@***.***> wrote:
I just modified the "backup name" to be response_var_.
|
Looks like this got left on a development branch and didn't get merged into master. I guess I should fix that ;-) |
Looks like I need to fix some tests that are written assuming the old behavior. |
Tests adjusted (in mosaicCore) to match new behavior. |
First, I love mosaic -- I've been transitioning to for HS students using R.
I use it also with an undergraduate regression course. In the past I've used something like
describe()
from psych to get a quick look at the descriptives for multiple variables:But, I'd really like to keep to mosaic as much as possible (and the tidyverse run out with piping if possible). Is if possible to get
favstats()
to produce a multiple variable table (summary for multiple variables at once)? Something like:Any direction or advice is appreciated,
Michael
The text was updated successfully, but these errors were encountered: