Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A very rough start to a Getting Started Vignette #1145

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions vignettes/purrr.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
title: "purrr"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{purrr}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)

library(purrr)
```

[Big picture]

The purrr package makes applying your functions to multiple elements of a list or data frame easy but you don't need a `for` loop.


### The purrr function families

- `map` apply function multiple times/ multiple outputs
- `reduce` 1 output
- `predicate` TRUE/FALSE logical output

### `map()` family

Map is used to apply the same function multiple times. It can work on lists, data frames, and other things. The first argument, `.x` is the object, the second argument, `.f` is the function you want to apply. Here is a simple example of how map is used.

```{r}
x <- list(1,2,3)

map(.x = x, .f = sqrt)

```

However, the example above isn't that useful because the data could have easily been a vector. The `map` functionality becomes more important when you consider a more complex object like a data frame and a function that doesn't work with a regular mutate. We can create a custom function, then apply that to a column in mtcars.

```{r}

# Simple example here. But haven't found one to copy from the books.

```

Often, it's easier to describe the function inside the map call, this is when you can create an anonymous function using `~`.

In this more useful example, the base R function `split` is used to create a list of data frames. `map` is then used to fit a regression model with the `lm` function for each group. Note that the first time `~` appears, it's creating the anonymous function, then it is used within `lm` as part of the formula.

```{r}

by_cyl <- split(mtcars, mtcars$cyl)

by_cyl %>%
map(~ lm(mpg ~ wt, data = .x)) %>%
map(coef) %>%
map_dbl(2)

```

`map` takes only one argument and always outputs a list. If you want to use multiple arguments, variants such as `map2` and `pmap` will work. If you want to output something other than a list, there are suffixs such as `_chr` and `_dbl`.

`map_vec` is a special use case ...
```{r}

# map_vec example here

```

Special Note: Progress bar ... seriously, how do we emphasize this, it's going to change my life.

When you start using `purrr` functions for large datasets or mapping complex functions, it can be challenging to know whether your code is running correctly because it takes a while to run. Use the `.progress` argument to make a progress bar in your mapping functions. To set one up, we recommend setting the name of the progress bar using a short string.
```{r}

# simple progress bar example.

```

Progress bars can have a lot more functionality, which you should read about here...

### `reduce`()`

Reduce combines the elements of a vector, `.x`, into one number using the `.f` function. Like `map`, the simplest use case doesn't really demonstrate why it's valuable.
```{r}

reduce(1:4, `+`)

reduce(1:4, union)

```
As we start looking at the more complex use cases, the `accumulate` variant can be helpful for understanding what is happening. `accumulate` works the same as `reduce`, but it includes the intermediate steps. If we call `accumlulate` on the examples above, it's easier to see how the numbers are being combined sequentially,
```{r}

accumulate(1:4, `+`)

accumulate(1:4, union)

```

Similar to map, we can think about how reduce can save us from having to use a `for` loop ....

```{r}

# Use map to generate sample data
l <- map(1:4, ~ sample(1:10, 15, replace = T))

# For loop to find values that occur in every element
out <- l[[1]]
for (i in seq(2, length(l))) {
out <- intersect(out, l[[i]])
}
out

# Same functionality with reduce
reduce(l, intersect)

```

### `predicate`()`

Is this all we want to show? Is there another example that would be good?

```{r}

df <- data.frame(x = 1:3, y = c("a", "b", "c"))
detect(df, is.factor)
detect_index(df, is.factor)

str(keep(df, is.factor))
str(discard(df, is.factor))

```


Loading