-
-
Notifications
You must be signed in to change notification settings - Fork 227
/
17-workflow.Rmd
329 lines (240 loc) · 21.4 KB
/
17-workflow.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
# Workflow
In this chapter, we introduce some tips on working with individual R Markdown documents as well as running your R Markdown projects. You may also check out [Chapter 30](https://r4ds.had.co.nz/r-markdown-workflow.html) of the book _R for Data Science_ [@wickham2016], which briefly introduces some tips on using analysis notebooks (including R Markdown documents). Nicholas Tierney also discusses the workflow in the book [_R Markdown for Scientists_.](https://rmd4sci.njtierney.com/workflow.html)
## Use RStudio keyboard shortcuts {#rstudio-shortcuts}
The R Markdown format can be used with any editor of your choice, as long as R, the **rmarkdown** package, and Pandoc are installed. However, RStudio\index{RStudio!keyboard shortcuts} is deeply integrated with R Markdown, so you can work with R Markdown smoothly.
Like any IDE (Integrated Development Environment), RStudio has keyboard shortcuts. A full list can be found under the menu `Tools -> Keyboard Shortcuts Help`. Some of the most useful shortcuts related to R Markdown are summarized in Table \@ref(tab:shortcuts).
```{r, include = FALSE}
ks_win <- function(letters, ctrl = TRUE, alt = TRUE, shift = FALSE, enter = FALSE) {
paste0(
if (ctrl) "Ctrl+",
if (alt) "Alt+",
if (shift) "Shift+",
if (enter) "Enter+",
letters
)
}
ks_mac <- function(letters, cmd = TRUE, opt = TRUE, shift = FALSE, enter = FALSE) {
paste0(
if (cmd) "Command+",
if (opt) "Option+",
if (shift) "Shift+",
if (enter) "Enter+",
letters
)
}
```
```{r shortcuts, echo = FALSE}
keyboard_table <- tibble::tribble(
~ "Task" , ~ "Windows & Linux" , ~ "macOS",
"Insert R chunk" , ks_win("I") , ks_mac("I"),
"Preview HTML" , ks_win("K", alt = FALSE, shift = TRUE) , ks_mac("K", opt = FALSE, shift = TRUE),
"Knitr document (knitr)" , ks_win("K", alt = FALSE, shift = TRUE) , ks_mac("K", opt = FALSE, shift = TRUE),
"Compile Notebook" , ks_win("K", alt = FALSE, shift = TRUE) , ks_mac("K", opt = FALSE, shift = TRUE),
"Compile PDF" , ks_win("K", alt = FALSE, shift = TRUE) , ks_mac("K", opt = FALSE, shift = TRUE),
"Run all chunks above" , ks_win("P") , ks_mac("P"),
"Run current chunk" , ks_win("C") , ks_mac("C"),
"Run current chunk" , ks_win("Enter", TRUE, FALSE, TRUE) , ks_mac("Enter", TRUE, FALSE, TRUE),
"Run next chunk" , ks_win("N") , ks_mac("N"),
"Run all chunks" , ks_win("R") , ks_mac("R"),
"Go to next chunk/title" , ks_win("PgDown", alt = FALSE) , ks_mac("PgDown", opt = FALSE),
"Go to previous chunk/title", ks_win("PgUp", alt = FALSE) , ks_mac("PgUp", opt = FALSE),
"Show/hide document outline", ks_win("O", TRUE, FALSE, TRUE) , ks_mac("O", TRUE, FALSE, TRUE),
"Build book, website, ..." , ks_win("B", TRUE, FALSE, TRUE) , ks_mac("B", TRUE, FALSE, TRUE)
)
knitr::kable(keyboard_table, caption = "RStudio keyboard shortcuts related to R Markdown.", booktabs = TRUE)
```
Additionally, you can press `F7` to spell-check your document. You can also restart the R session by `Ctrl + Alt + F10` (or `Command + Option + F10` on macOS). Restarting regularly is helpful for reproducibility, because results are more likely to be reproducible if they are computed from a new R session. This can also be done through the drop-down menu Restart R and Run All Chunks behind the Run button on the toolbar.
## Spell-check R Markdown {#spell-check}
If you use the RStudio IDE\index{RStudio!spellcheck}, you can press the `F7` key or click the menu `Edit -> Check Spelling` to spell-check an Rmd document. Real-time spell checking has become available in RStudio v1.3, so you no longer need to manually trigger spell checking with this version or a later version of RStudio.
If you do not use RStudio, the **spelling** package\index{R package!spelling} [@R-spelling] has a function `spell_check_files()`, which can check the spelling of common document formats, including R Markdown. When spell checking Rmd documents, it will skip code chunks and only check ordinary text.
## Render R Markdown with `rmarkdown::render()` {#rmarkdown-render}
If you do not use RStudio or any other IDE, you need to know this fact: R Markdown documents are rendered through the function `rmarkdown::render()`\index{rmarkdown!render()}. This means you can programmatically render an R Markdown document in any R script. For example, you could render a series of reports in a `for`-loop for each state of a country:
```{r, eval=FALSE, tidy=FALSE}
for (state in state.name) {
rmarkdown::render(
'input.Rmd', output_file = paste0(state, '.html')
)
}
```
The output filename will be different for each state. You can also make use of the `state` variable in the document `input.Rmd`, e.g.,
````md
---
title: "A report for `r knitr::inline_expr('state')`"
output: html_document
---
The area of `r knitr::inline_expr('state')` is `r knitr::inline_expr('state.area[state.name == state]')`
square miles.
````
You may read the help page `?rmarkdown::render` to know other possible arguments. Here we just want to mention two of them, i.e., the `clean` and `envir` arguments.
The former (`clean`) is particularly helpful for debugging when anything goes wrong with the Pandoc conversion. If you call `rmarkdown::render(..., clean = FALSE)`, all intermediate files will be preserved, including the intermediate `.md` file knitted from the `.Rmd` file. If Pandoc signals an error, you may start debugging from this `.md` file.
The latter (`envir`) offers a way to render a document with the guarantee of an empty new environment when you call `rmarkdown::render(..., envir = new.env())`, so objects created in your code chunks will stay inside this environment, without polluting your current global environment. On the other hand, if you prefer rendering the Rmd document in a new R session so that objects in your current R session will not pollute your Rmd document, you may call `rmarkdown::render` in `xfun::Rscript_call()`, e.g.,
```{r, eval=FALSE, tidy=FALSE}
xfun::Rscript_call(
rmarkdown::render,
list(input = 'my-file.Rmd', output_format = 'pdf_document')
)
```
This method is similar to clicking the `Knit` button in RStudio\index{RStudio!Knit button}, which also renders the Rmd document in a new R session. In case you need to render an Rmd document inside another Rmd document, we strongly recommend that you use this method instead of directly calling `rmarkdown::render()` in a code chunk, because `rmarkdown::render()` creates and relies on a lot of side effects internally, which may affect rendering other Rmd documents in the same R session.
The second argument of `xfun::Rscript_call()` takes a list of arguments to be passed to `rmarkdown::render`(). In fact, `xfun::Rscript_call` is a general-purpose function to call any R function (with arguments optionally) in a new R session. Please see its help page if you are interested.
## Parameterized reports {#parameterized-reports}
In Section \@ref(rmarkdown-render), we mentioned one way to render a series of reports in a `for`-loop. In fact, `rmarkdown::render()`\index{rmarkdown!render()} has an argument named `params` specifically designed for this task. You can parameterize\index{parameter} your report through this argument. When you specify parameters for a report, you can use the variable `params` in your report. For example, if you call:
```{r, eval=FALSE, tidy=FALSE}
for (state in state.name) {
rmarkdown::render('input.Rmd', params = list(state = state))
}
```
then in `input.Rmd`, the object `params` will be a list that contains the `state` variable:
````md
---
title: "A report for `r knitr::inline_expr('params$state')`"
output: html_document
---
The area of `r knitr::inline_expr('params$state')` is
`r knitr::inline_expr('state.area[state.name == params$state]')`
square miles.
````
Another way to specify parameters for a report is to use the YAML field `params`, e.g.,
```yaml
---
title: Parameterized reports
output: html_document
params:
state: Nebraska
year: 2019
midwest: true
---
```
Note that you can include as many parameters in the `params` YAML field\index{YAML!params} or the `params` argument of `rmarkdown::render()`. If both the YAML field and the argument are present, the parameter values in the argument will override the corresponding parameters in YAML. For example, when we call `rmarkdown::render(..., params = list(state = 'Iowa', year = 2018)` on the previous example that has the `params` field, `params$state` will become `Iowa` (instead of `Nebraska`) and `params$year` will become `2018` (instead of `2019`) in the R Markdown document.
When rendering the same R Markdown document to a series of reports, you need to adjust the `output_file` argument of `rmarkdown::render()`, to make sure each report has its unique filename. Otherwise, you will accidentally override certain report files. For example, you can write a function to generate a report for each state and each year:
```{r, eval=FALSE, tidy=FALSE}
render_one <- function(state, year) {
# assuming the output format of input.Rmd is PDF
rmarkdown::render(
'input.Rmd',
output_file = paste0(state, '-', year, '.pdf'),
params = list(state = state, year = year),
envir = parent.frame()
)
}
```
Then you can use nested `for`-loops to generate all reports:
```{r, eval=FALSE}
for (state in state.name) {
for (year in 2000:2020) {
render_one(state, year)
}
}
```
At the end, you will get a series of report files like `Alabama-2000.pdf`, `Alabama-2001.pdf`, ..., `Wyoming-2019.pdf`, and `Wyoming-2020.pdf`.
For parameterized reports, you can also input parameters interactively through a graphical user interface (GUI) created from Shiny. This requires you to provide a `params` field in YAML, and **rmarkdown** will automatically create the GUI using the appropriate input widgets for each parameter (e.g., a checkbox will be provided for a Boolean parameter).
To start the GUI, you can call `rmarkdown::render()` with `params = 'ask'` if you do not use RStudio:
```{r, eval=FALSE}
rmarkdown::render('input.Rmd', params = 'ask')
```
If you use RStudio, you can click the menu `Knit with Parameters`\index{RStudio!Knit with Parameters} behind the `Knit` button. Figure \@ref(fig:params-shiny) shows an example GUI for parameters.
```{r, params-shiny, echo=FALSE, fig.cap='Knit an R Markdown document with parameters that you can input from a GUI.'}
knitr::include_graphics('images/params-shiny.png', dpi = NA)
```
For more information on parameterized reports, you may read [Chapter 15](https://bookdown.org/yihui/rmarkdown/parameterized-reports.html) of the _R Markdown Definitive Guide_ [@rmarkdown2018].
## Customize the `Knit` button (\*) {#custom-knit}
When you click the `Knit` button in RStudio\index{RStudio!Knit button}, it will call the `rmarkdown::render()` function in a new R session and output a file of the same base name as the input file in the same directory. For example, knitting `example.Rmd` with the output format `html_document` will create an output file `example.html`.
There may be situations in which we want to customize how the document is rendered. For example, perhaps we would like the rendered document to contain the current date, or would like to output the compiled report into a different directory. Although we can achieve these goals by calling `rmarkdown::render()` (see Section \@ref(rmarkdown-render)) with the appropriate `output_file` argument, it can be inconvenient to have to rely on a custom call to `rmarkdown::render()` to compile your report.
It is possible to control the behavior of the `Knit` button by providing the `knit` field within the YAML frontmatter of your document. The field takes a function with the main argument `input` (the path to the input Rmd document) and other arguments that are currently ignored. You can either write the source code of the function directly in the `knit` field, or put the function elsewhere (e.g., in an R package) and call the function in the `knit` field. If you routinely need the custom `knit` function, we would recommend that you put it in a package, instead of repeating its source code in every single R Markdown document.
If you store the code directly within YAML, you must wrap the entire function in parentheses. If the source code has multiple lines, you have to indent the first line by exactly two spaces and all other lines by at least two spaces. For example, if we want the output filename to include the date on which it is rendered, we could use the following YAML code\index{YAML!knit} in the Rmd document header:
```yaml
---
knit: |
(function(input, ...) {
rmarkdown::render(
input,
output_file = paste0(
xfun::sans_ext(input), '-', Sys.Date(), '.html'
),
envir = globalenv()
)
})
---
```
For example, if we knit `example.Rmd` on 2019-07-29, the output filename will be `example-2019-07-29.html`.
While the above approach looks simple and straightforward enough, embedding a function directly in your YAML may make it difficult for you to maintain it, unless the function is only to be used once with a single R Markdown document. In general, we would recommend using an R package to maintain such a function, e.g., you may create a function `knit_with_date()` in a package:
```{r, eval=FALSE, tidy=FALSE}
#' Custom Knit function for RStudio
#'
#' @export
knit_with_date <- function(input, ...) {
rmarkdown::render(
input,
output_file = paste0(
xfun::sans_ext(input), '-', Sys.Date(), '.'
),
envir = globalenv()
)
}
```
If you add the above code to a package named **myPackage**, you will be able to refer to your custom `knit` function using the following YAML setting:
```yaml
---
knit: myPackage::knit_with_date
---
```
You may refer to the help page `?rmarkdown::render` to find out more ideas on how you could customize your `knit` function behind the `Knit` button in RStudio.
## Collaborate on Rmd documents through Google Drive with trackdown {#google-drive}
The **trackdown** package\index{R package!trackdown} [@R-trackdown] offers a simple solution for collaborative writing and editing of R Markdown (or Sweave) documents. Based on the **googledrive** package\index{R package!googledrive} [@R-googledrive], **trackdown** allows to upload the local `.Rmd` (or `.Rnw`) file as a plain-text file to Google Drive. By taking advantage of the easily readable Markdown (or LaTeX) syntax and the well-known online interface offered by Google Docs, collaborators can easily contribute to the writing and editing process. After integrating all authors’ contributions, the final document can be downloaded and rendered locally.
You can install **trackdown** from CRAN, or you may try the development version on GitHub (https://github.com/claudiozandonella/trackdown):
```{r, echo=TRUE, eval=FALSE}
# install from CRAN
install.packages("trackdown")
# install the development version
remotes::install_github("claudiozandonella/trackdown",
build_vignettes = TRUE)
```
The package documentation is available at https://claudiozandonella.github.io/trackdown/.
### The trackdown Workflow
During the collaborative writing/editing of an `.Rmd` (or `.Rnw`) document, it is important to employ different workflows for computer code and narrative text:
- **Code** - Collaborative code writing is done most efficiently by following a traditional **Git**-based workflow using an online repository (e.g., GitHub or GitLab).
- **Narrative Text** - Collaborative writing of narrative text is done most efficiently using **Google Docs** which provides a familiar and simple online interface that allows multiple users to simultaneously write/edit the same document.
Thus, the workflow’s main idea is simple: Upload the `.Rmd` (or `.Rnw`) document to Google Drive to collaboratively write/edit the narrative text in Google Docs; download the document locally to continue working on the code while harnessing the power of Git for version control and collaboration. This iterative process of uploading to and downloading from Google Drive continues until the desired results are obtained (See Figure \@ref(fig:trakdown1)). The workflow can be summarized as:
> Collaborative **code** writing using **Git** & collaborative writing of **narrative text** using **Google Docs**
For a detailed example of the workflow see https://claudiozandonella.github.io/trackdown/articles/trackdown-workflow.html.
```{r trakdown1, echo = FALSE, fig.cap = "trackdown workflow, collaborative code writing is done locally using Git whereas collaborative writing of the narrative text is done online using Google Docs", out.width='100%'}
knitr::include_graphics("images/trackdown1.png", dpi = NA)
```
#### Functions and Special Features{-}
**trackdown** offers different functions to manage the workflow:
- `upload_file()` uploads a file for the first time to Google Drive.
- `update_file()` updates the content of an existing file in Google Drive with the contents of a local file.
- `download_file()` downloads the edited version of a file from Google Drive and updates the local version.
- `render_file()` downloads a file from Google Drive and renders it locally.
Moreover, **trackdown** offers additional features to facilitate the collaborative writing and editing of documents in Google Docs. In particular, it is possible to:
- **Hide Code:** Code in the header of the document (YAML header or LaTeX preamble) and code chunks are removed from the document when uploading to Google Drive and are automatically restored during download. This prevents collaborators from inadvertently making changes to the code which might corrupt the file and allows them to focus on the narrative text.
- **Upload Output:** The actual output document (i.e., the rendered file) can be uploaded to Google Drive in conjunction with the `.Rmd` (or `.Rnw`) document. This helps collaborators to evaluate the overall layout including figures and tables and allows them to add comments to suggest and discuss changes.
- **Use Google Drive shared drives:** The documents can be uploaded to either a personal Google Drive or to a shared drive to facilitate collaboration.
#### Advantages of Google Docs {-}
Google Docs offers users a familiar, intuitive, and free web-based interface that allows multiple users to simultaneously write/edit the same document. In Google Docs it is possible to:
- track changes (incl. accepting/rejecting suggestions)
- add comments to suggest and discuss changes
- check spelling and grammar errors (potentially integrating third-party services like Grammarly)
Moreover, Google Docs allows anyone to contribute to the writing/editing of the document. No programming experience is required, users can just focus on writing/editing the narrative text.
Note that not all collaborators have to have a Google account (although this is recommended to utilize all Google Docs features). Only the person who manages the **trackdown** workflow needs to have a Google account to upload files to Google Drive. Other collaborators can be invited to contribute to the document using a shared link.
## Organize an R Markdown project into a research website with **workflowr** {#workflowr}
The **workflowr** package\index{R package!workflowr} [@R-workflowr; @workflowr2019] can help you organize a (data analysis) project with a project template\index{template!project} and the version control tool GIT. Every time you make a change to the project, you can log the change, and **workflowr** can build a website corresponding to that particular version of your project. This means that you will be able to view the full history of your analysis results. Although this package uses GIT as the backend for version control, you do not really have to be familiar with GIT. The package provides R functions that do the GIT operations under the hood, and you only need to call these R functions. Furthermore, **workflowr** automates best practices for reproducible code. Each time an R Markdown document is rendered, **workflowr** automatically sets a seed with `set.seed()`, records the session information with `sessionInfo()`, and scans for absolute file paths, etc. Please see the [package documentation](https://workflowr.github.io/workflowr/) for how to get started and more information.
The main author of **workflowr**, John Blischak, has also put together a non-exhaustive list of R packages and guides related to the workflow of R projects, which can be found in this GitHub repo: https://github.com/jdblischak/r-project-workflows.
## Send emails based on R Markdown {#blastula-email}
With the **blastula** package\index{R package!blastula}\index{email} [@R-blastula], you can render an Rmd document to the email body and send the email. To render an Rmd document to an email, the document needs to use the output format `blastula::blastula_email`, e.g.,
````md
---
title: Weekly Report
output: blastula::blastula_email
---
Dear Boss,
Below is an analysis of the `iris` data:
```{r}`r ''`
summary(iris)
plot(iris[, -5])
```
Please let me know if it is not boring enough.
Sincerely,
John
````
This Rmd document should be rendered via the function `blastula::render_email()`, and the output can be passed to `blastula::smtp_send()`, which will send out the email. Note that `smtp_send()` needs an email server as well as your credentials.
If you use RStudio Connect, you can find more examples at https://solutions.rstudio.com/r/blastula/, including automated, conditional, and parameterized emails.