forked from libjohn/rfun_flipped
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathoutline.Rmd
239 lines (194 loc) · 10.3 KB
/
outline.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
---
title: Outline for modules
author: John Little
date: "`r Sys.Date()`"
output: github_document
---
<!-- READ THIS -->
<!-- file is generated from outline.Rmd. Please edit that file: i.e. edit outline.Rmd -->
<!-- READ THIS ^^^ Yeah, you!! Eyes up here ^^^ -->
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Introduction to R - Refactoring the Workshop
Refactor existing workshop for smaller video segments and flipped presentation
1. Download software & run locally
- [R](https://cran.r-project.org/) & [RStudio](https://rstudio.com/products/rstudio/download/) are not the same thing
- [RStudio keyboard Shortcuts](https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts)
- Cloud Options
- https://rstudio.cloud
- https://vm-manage.oit.duke.edu/containers
- R v Python
- R is a _data first_ (i.e. analysis) programming language
- Python is a general programming language
- Are there libraries/packages relevant to your work?
- Is there a supportive community?
- [Rfun](https://rfun.library.duke.edu/)
- [R Community](https://community.rstudio.com/)
- [R Ladies](https://rladies.org/) | Locally, [R Ladies, RTP](https://www.meetup.com/rladies-rtp/)
- [TidyTuesday](https://github.com/rfordatascience/tidytuesday) weekly practice sessions (+ David Robinson's [recorded live feed of TT on YouTube](https://www.youtube.com/watch?v=NY0-IFet5AM&list=PL19ev-r1GBwkuyiwnxoHTRC8TTqP8OEi8) )
- ML leans towards Python, [or does it](https://www.tidymodels.org/learn/)?
- Programmer/Coder v Analyst
- [how to choose a programming language](https://medium.com/better-programming/how-to-choose-a-programming-language-for-a-project-7c7a3e5a4de6)
- [how to choose a religion](https://www.wikihow.com/Find-the-Right-Religion-for-You)
1. An RStudio project & reproducibility
- **You are your most frequent collaborator** separated by time
- **A simple test**: Identify specific computational steps from a six-month old project?
- **A simple goal**: Reproduce your computation on a different computer
- Initial **Reproducibility in a nutshell**
- Do everything with a script
- Avoid point & click
- Use relative paths
- Write your code to run on any similar environment
- Read more: [Initial steps toward reproducible research.](https://kbroman.org/steps2rr/) Karl Broman
- **RStudio Projects enable _Reproducibility_**
1. **Relative files paths**
- `read_csv("data_raw/raw_data.csv")`
- ProTip: `..` to move up one level in the directory structure
- Avoid absolute paths
- avoid `setwd()`
- e.g. `setwd("d:/rfiles/myrproject")`
4. **_Restart R and run all chunks_**
- avoid: `rm(list=ls())`
6. R Markdown & **literate coding**
- A script integrates code and natural language
- Explain and describe your analysis within your workflow
- Render reports in multiple formats
- Notebooks, slide decks, web pages, dashboards, e-books, journal articles
7. **File structure** matters
- [_Practice of Reproducible Research_](https://www.practicereproducibleresearch.org/) by Kitzes, Turek, Deniz
```
EXAMPLE File Structure...
project_name (folder)
|-- project_name.Rproj
|-- README.md
|-- license.txt
| data_raw
| |-- raw_data.csv
| |-- README.txt
| data_clean
| code_source
| |--data_cleaning.Rmd
| |--analysis.Rmd
| images
| reports_results
```
3. Get Data & Code Repository
- Access your own data file (e.g. CSV)
- [Download](https://github.com/libjohn/intro2r-code) & Expand a GitHub repository
- Click on *.Rproj
4. Tour of the RStudio environment
- Create a blank project
- Console | Files / Packages / Help | Environment | Script Editor
- R Markdown
- Switch to your other project (from Section 2)
- [Keyboard Shortcuts](https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts)
1. Tidyverse & other library packages
- Packages extend the functionality of base R into your domain
Practice | Frequency | Command
-------- | --------- | -------
Install | once | `install.packages("tidyverse")`
Load | each time | `library(tidyverse)`
- [**Tidyverse**](https://tidyverse.org):
1. an opinionated collection of packages with consistent web-based documentation and a supportive community
2. a Meta-package that loads 8 helper packages and installs many consistent utilities
Name | Purpose
--- | ---
readr | importing CSV data
dplyr | transforming data
ggplot2 | visualizing
tibble | rectangular grid / data frame
tidyr | pivot
forcats | categorical data / factors
stringr | string data / manipulate natural language
purrr | iteration
- **Other package repositories**
- [CRAN](https://cran.r-project.org/web/packages/)
- [Tidyverse Packages](https://www.tidyverse.org/packages/)
- [BioConductor](https://www.bioconductor.org/packages/release/BiocViews.html#___Software)
- [ROpenSci](https://ropensci.org/)
- [MetaCran](https://www.r-pkg.org/)
1. Assignment `<-` & pipe `%>%`
1. R, RStudio IDE, R Markdown
- Base R, in the console
- A big calculator
- RStudio & Tidyverse
- RMarkdown Notebook: reproducible literate coding = Prose + Coding + Reports + RStudio projects + version control (git/GitHub)
- [R Markdown](https://rmarkdown.rstudio.com/lesson-1.html)
- Code Chunks: Separate prose from code
- [Literate Programming](https://en.wikipedia.org/wiki/Literate_programming) (e.g. Jupyter notebooks, R notebooks)
- [R Markdown Cheatsheet](https://rmarkdown.library.duke.edu/slides/index.html#5)
- Integrate a natural language explanation of your analysis along with your code snippets. An approach used within computational sciences to create a functional record of *reproducible* research. You are your most frequent collaborator, six months from now, or six months ago.
- Create a new, blank R Notebook
- R **Notebook** v. R Markdown **Document[s]**
- Many types of R Reports
- **slides**, documents, web pages, **e-books**, journal articles, web sites, **dashboards**, interactive **HTML widgets**, etc.
- Save the file (`*.Rmd`)
- Preview the file
- (Share with others, so they don't have to recreate your compute environment.)
1. [`dplyr`](https://dplyr.tidyverse.org/) package
- "A grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges"
- `mutate()` adds new variables
- `select()` pick variables / columns
- `filter()` subset data by row
- `summarise()` reduces multiple values into a summary
- `count()` a special case of `summarize()` to tally occurrences
- `arrange()` sort rows
- [RStudio Keyboard Shortcuts](https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts)
1. [`tidyr`](https://tidyr.tidyverse.org/) package
- Make messy data into **tidy data**
- Every variable is a column
- Every row is an observation
- Every cell is a single value
- i.e. **pivots**
1. dplyr revisited
- People who like `pivot_longer()` also like `dplyr::left_joint()`
1. Exploratory Data Analysis (EDA): `ggplot2()` & `skimr`
- `skimr::skim()` from library[(skimr)](https://docs.ropensci.org/skimr/)
- ggplot2(): a brief overview of visualization
1. [`ggplot2()`](https://ggplot2.tidyverse.org/): an introduction to the grammar of graphics, & interactive plots via `plotly`
1. R Markdown
---
## Future modules
20. Large Data
25. Regression
30. Dashboards
35. Slides (Xaringan)
40. Mapping and Geocomputation / Spatial Analysis
45. Version Control: git and GitHub
## Quick Start - Demonstration
1. Make a data folder
1. Drag fav.csv into the data folder
1. Make existing folder and RStudio project
1. Open an R Markdown Notebook
1. `library(tidyverses)` plus other libraries
1. IMPORT data
- See Also [_RStudio data import wizard_](https://support.rstudio.com/hc/en-us/articles/218611977-Importing-Data-with-RStudio)
- ATTACH data
1. EDA: Visualize `ggplot(data = starwars, aes(hair_color)) + geom_bar()`
1. EDA: `skimr::skim(starwars)`
1. EDA: summary(fav_rating)
1. `left_join(starwars, fivethirtyeight)`
1. Transform data: five dplyr verbs ...
- `filter`, `select`, `arrange`, `mutate`
- `count` / `group_by` & `summarize`
1. Interactive visualization: `ggplotly`
1. linear regression / models (quick syntax introduction)
1. Reports: notebooks, slides, dashboards, word document, PDF, book, etc.
## Resources
- [RStudio Primers](https://rstudio.cloud/learn/primers)
- [DataQuest.io](https://www.dataquest.io/data-science-courses-directory/)
- [Tidy Tuesday](https://github.com/rfordatascience/tidytuesday) practice
- [Rfun](https://rfun.library.duke.edu/) - **R we having fun yet‽**
- [ReMaster the Tidyverse](https://github.com/rstudio-education/remaster-the-tidyverse)
- Update of [Master the Tidyverse](https://github.com/rstudio/master-the-tidyverse)
- [_R for Data Science_](https://r4ds.had.co.nz/) by Wickham and Grolemund
- [Introduction to Data Science](https://introds.org/) by Çetinkaya-Rundel
- [_Hands-On programming_](https://rstudio-education.github.io/hopr/) by Grolemund
- [_ggplot2: Elegant Graphics for Data Analysis_](https://ggplot2-book.org/) by Wickham
- [_Interactive web-based data visualization with R, plotly, and shiny_](https://plotly-r.com/) by Sievert
- [_Data Visualization A practical introduction_](https://socviz.co/makeplot.html#makeplot) by Healy
- [_Text Mining with R_](https://www.tidytextmining.com/) by Silge & Robinson
- [Shiny](https://shiny.rstudio.com/)
- [_Statistical Inference via Data Science: A ModernDive into R and the Tidyverse_](https://moderndive.com/) by Ismay and Kim
- [Tidymodels](https://www.tidymodels.org/) _packages for modeling and machine learning_