Skip to content

Latest commit

 

History

History
1206 lines (828 loc) · 23 KB

content.md

File metadata and controls

1206 lines (828 loc) · 23 KB

layout:true

Data Analysis with R

Creative-Commons-License

--

class: center,middle

img-center-50

Data Analysis with R


Facilitator: Richard Dunks

Data Analysis with R by Richard Dunks and Julia Marden is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License


class:center,middle

Welcome


exclude:true

???

  • Facilitators will cover the following skills: muting themselves, stopping their video, typing in chat box, raising their hand, sharing their screen
  • Mute and Unmute your microphone
  • Start and Stop your video
  • Post a message in the Chat window with your name and computer operating system (Windows or MacOS)
  • Click the Participants window and Raise your hand

A Few Ground Rules

???

  • Facilitators establish the intention we have for the culture of the classroom

--

  • Step up, step back --

  • One mic --

  • Be curious and ask questions in the chat box --

  • Assume noble regard and positive intent --

  • Respect multiple perspectives --

  • Be present (phone, email, social media, etc.)


Introducing Yourself

--

Share with your neighor

--

  • Who you are --

  • Where you work --

  • What you hoping to learn today --

  • What you've done with code (any code)


What to Expect Today

--

  • Introduction to R --

  • Using R in Data Analysis --

  • Getting Familiar: R Syntax + R Studio --

  • 311 Data Analysis --

  • Presentations!


Key Skills

--

  • R syntax and commands --

  • RStudio --

  • Load data --

  • Explore data --

  • Wrangle data --

  • Visualize data

???

  • Students will review progress and give feedback on key takeaways

name:housekeeping

Housekeeping

--

  • We’ll have one 15 minute break in the morning --

  • We’ll have an hour for lunch --

  • We’ll have a 15 minute break in the afternoon --

  • Class will start promptly after breaks --

  • Feel free to use the bathroom if you need during class --

  • Please take any phone conversations into the hall to not disrupt the class


What is Analysis?

--

“Analysis is simply the pursuit of understanding, usually through detailed inspection or comparison”

???

  • Orient students to key concept in analysis
  • Use R to uncover meaning in data

The Analytical Process

img-center-80

???

  • Establish frame for the analytics process to be followed in class
  • Familiarize students with terminology (esp "data wrangling/data cleaning")
  • Demystify the process
  • Empower students to do analysis

Exercise: Old Faithful

img-center-70 .caption[Image Credit: Astroval1, CC BY-SA 4.0 via Wikimedia Commmons]

???

  • Facilitator provides context for the exercise by describing Old Faithful
  • Students will download script with prepared code snippets to run
  • Students will learn the steps of running summary statistics in R

Identify the Question

--

  • What's the minimum amount of time I should plan to spend at Old Faithful? --

  • Is there a relationship between the amount of time I wait and the length of time it erupts? --

???

  • Students will understand the problem we're seeking to solve in class
  • Students will learn by example the value of problem setting.
  • This will be done by writing out explicit problem statement for 311 Noise, possibly vision 0 db after we have exercise.

Exercise: Old Faithful

img-center-85

???

  • Students will open and load a simple dataset.
  • They will inspect the data in the viewer and confirm it loaded properly.
  • This will be done by live demo of code
  • Students will be writing code themselves
  • Introduce basic commands and tab completion
  • Describe comments and their purpose
  • Emphasize cooperation between participants

img-center-100

???

  • Introduce students to Console, Environment, and Help
  • Students will be familiar with the key features of the console for the exercises to come
  • This will be done by live demo and verbal discussion
  • Ctrl+L clear console

What is Syntax?

--

img-center-80 .caption[Image Credit: AnonMoos, Public Domain via Wikipedia]

???

  • Students will get vocabulary for accomplishing tasks in code
  • This will be done with an overview discussion

R Syntax

# basic command

command(dataset)
View(faithful)

???

  • Facilitator guides students through basic syntax in R for simple tasks
  • Instructor reinforces syntax idea and relation to regular sentence structure to convey meaning where appropriate --
# select a column

command(dataset$column)
mean(faithful$waiting)

--

# get help

?help
?faithful

Your Turn 1

--

  • Look through the code we just wrote --

  • Make a change to one thing on the chart --

  • If necessary, check out the help documentation --

  • Be ready to describe what you did


img-right-40

What is R?

--

  • Statistical programming language --

  • Open-source --

  • Made for and by people who work with data --

  • Used for data analysis --

  • For the history of R, see this video

???

  • Familiarize students with basics of R and set context
  • "Created for and by the people" - Julia Marden

R vs. Excel

???

  • Facilitator compares R directly to Excel for context (assuming most participants are well-acquainted with Excel) --

  • R is a programming language while Excel is an application --

  • R can work with much larger datasets than Excel --

  • R can perform more complex operations than Excel --

  • R commands can be easily saved, re-run, and automated --

  • R doesn't have the icons, animations, and wizards of Excel


name:nola

New Orleans Distributes Smoke Alarms

img-center-40 .caption[Image Credit: Michael Barnett CC BY-SA 2.5, via Wikimedia Commons]

???

  • Students will be inspired to use their knowledge in practical applications

Targeted Outreach Saves Lives

img-center-90 .caption[Image Credit: City of New Orleans, via nola.gov]

???

  • Students will be inspired to use their knowledge in practical applications

Targeted Outreach Saves Lives

img-center-90 .caption[Image Credit: City of New Orleans, via nola.gov]

???

  • Students will be inspired to use their knowledge in practical applications

And Here's the R Code for It

img-center-80 Click here for the code


class:center,middle

Wrap-Up


class:center, middle

15 Min Break

img-center-100 Source: https://xkcd.com/378/


5 Data Analytics Tasks

--

  1. Sorting --

  2. Filtering --

  3. Aggregating (PivotTable) --

  4. Transforming --

  5. Visualizing


1. Sorting

--

  • Reorganize rows in a dataset based on the values in a column --

  • Can sort on multiple columns


Sorting in R

--

  • Use order() --

  • Specify the column you want to sort by
    (in our case eruptions or waiting) --

df[order(df$column_to_sort_by),]

--

Your Turn 2

  • Sort the Old Faithful data to find the shortest waiting time
  • Sort the Old Faithful data to find the longest waiting time

???

  • Why the comma?
  • The syntax is df[row specifier, column specifier].
  • If a specifier is absent, R returns all.

2. Filtering

--

  • Only show rows that contain some value --

  • Can filter by multiple values --

  • Can filter by values in multiple columns


Filtering in R

--

  • Provide some logical test (<, >, ==, etc.) --

  • The format is --

df[df$column_to_filter_by <logical test>,]

--

Your Turn 3

  • Filter the Old Faithful data for all eruptions longer than 4 minutes

3. Aggregating Data

--

  • Trends only become clear in aggregate --

  • Often where you discover the "so what" --

  • Aggregating data meaningfully can be tricky --

  • We'll be showing how to do this with R later


4. Transforming Data

--

  • Sometimes available categories don't make sense --

  • Values may not be in the format you need (or have mistakes) --

  • You always want to have a clean copy of the data to go back to --

  • Best to keep track of what you've done --

  • We'll be showing how to do this with R later


5. Visualizing Data

--

  • Quickly communicate information --

  • Tell a clearer story --

  • A picture is worth a thousands words --

  • We've already seen this with the Old Faithful data

hist(faithful$waiting)
hist(faithful$eruptions)

plot(faithful, main="Eruptions of Old Faithful", xlab="Eruption Time in Minutes", ylab="Waiting Time to Next Eruption in Min")
abline(lm(faithful$waiting~faithful$eruptions), col="red")

5 Data Analytics Tasks

  1. Sorting
  2. Filtering
  3. Aggregating (PivotTable)
  4. Transforming
  5. Visualizing

Derelict Vehicles

img-center-90

.center[Derelict Vehicles Across NYC]


The Analytical Process

img-center-80


Identify the Question

--

  • How many people complain about derelict vehicles? --

  • Do people complain more at a particular time of day? --

  • Do people complain more in a particular neighborhood or borough? --

img-center-45

???

  • Students will understand the problem we're seeking to solve in class
  • Students will learn by example the value of problem setting.
  • This will be done by writing out explicit problem statement for 311 Noise, possibly vision 0 db after we have exercise.

Exercise: 311 Service Requests

--

???

  • Students will conduct the same commands from Faithful with 311 exercise
  • Students will hit the roadblocks
  • Can't run summary statistics
  • Exercise will be run through script showing comments (not on slide)
  • Script will mirror the Faithful with intention of not working

R Data Types

img-center-80

???

  • Students will understand a few of the different data types in R
  • They will use the str and summary command
  • This will be done with a live demo of code

R Data Structures

--

img-center-100

--

  • You often need to restructure your data to make it usable

???

  • Students will review work done in simple data load
  • They will learn key elements of data structures based on Faithful data
  • This will be done with live demo and discussion
  • They will use the str and summary command

class:center, middle

Wrap-Up

???

  • Facilitator reviews the learning in the morning with participants
  • Facilitator answers any questions
  • If there is time, facilitator has participants switch and review someone else's code, then has them reflect on what they learned looking at someone else's code

class:center, middle

Lunch

img-center-60 Source: https://xkcd.com/1319/


class:center, middle

Welcome Back!


Data Wrangling (i.e. Cleaning)

--

  • Get data into right type or structure --

  • Create subsets --

  • Add packages to work with the data we have

???

  • start of section discussing manipulating data
  • picking up pieces from exercise where script failed
  • start of exercise 3

Packages

--

  • Add-ons: extra functions, data viz, special features --

  • Can help you load data, work with timestamps, create charts --

  • If you need to do something, there's probably a package for it

--

  • To use: install.packages()

???

  • Students will understand the purpose and value of packages
  • This will be done with a discussion

Exercise: 311 Service Requests

img-center-80

???

  • An example question of the 311 dataset
  • students will be walked through the exercise with a script
  • Prompts in the script with a more specific question
  • incidents per borough -> distribution of complaints

Your Turn 4

--

  • Switch out derelict vehicles for another complaint type --

  • Look at a different borough, ZIP, or community board --

  • Look at day of the week instead of hour --

  • Challenge yourself --

  • We'll be around to help


class:center,middle

It All Begins With a Question

???

  • Students will understand better the purpose of using code for analysis
  • Remind them we all have hypothesis -> need to be acknowledged

Questions

--

  • How many? --

  • Where? --

  • When? --

What are some of your questions of this data?

???

  • Prompts for starting your investigation of the data
  • Students will have a way to start exploring data
  • Discussion leading into guided exercise

Your Turn 5

--

  • Working in pairs or alone, start working on a question that interests you --

  • Start with a new script and give it a name --

  • Use the skills we've covered --

  • Challenge yourself to do something new --

  • Don't be afraid of not knowing --

  • Use the documentation --

  • Help each other out --

  • We'll be around to help


class:center, middle

15 Min Break

img-center-80 Source: https://xkcd.com/1831/


Debugging

--

  • Everyone gets errors all the time --

  • It's just a matter of how complex they are
    -- And fixing them --

  • Syntax errors -> using the wrong instructions --

  • Semantic errors -> doing the wrong things --

  • When in doubt, take a breath, try breaking things apart into smaller pieces, review the documentation, and search for help

???

  • Students will be introduced to key concepts in identifying and resolving errors
  • This will be done with a lecture/discussion leading into an exercise
  • Class exercise finding errors in code -> slide with code snippets in Markdown with errors
  • deal with issue of correctness

Exercise

  • Debug your neighbor's R Script and verify results

???

  • Students will examine another student's code, run the code, and fix any errors
  • Students will have a better understanding of how to think in code
  • Goal is to get students talking to each other about their code
  • have documentation at end of slides

exclude:true class:center,middle


class:middle,center

Code Review

???

  • Students will review select code examples
  • Goal is to model a collaborative process for data analysis
  • Time buffer for end of class

class:center,middle

Wrap Up


Key Skills

--

  • R syntax and commands --

  • RStudio --

  • Load data --

  • Explore data --

  • Wrangle data --

  • Visualize data --

  • Anything else?

???

  • Students will review progress and give feedback on key takeaways

Using This in the Real World

???

  • Facilitators reinforce key learning points with participants for integrating into their workflow --

  • R is a powerful tool for cleaning, analyzing, and visualizing data --

  • Integrating it into your workflow takes practice and a commitment to not giving up (Google is your friend) --

  • RStudio makes it easy to get started --

  • You should be able to download R and RStudio on your work computer (Use the zip/tarball option)


name:resources

Key Links

--


Resources for Learning R

--


Other Useful Resources

--

  • Tidyverse - R packages for Data Science --

  • Stat Methods - Great documentation for doing data analysis in R --

  • UCLA Stats - Many examples of statistical analysis with comparisons between R, Stata, SPSS, etc. --

  • Stack Overflow - One of the best Q&A sites for technology

???


Contact Information


class:center, middle

THANK YOU!


Exploring Data

View()
# show dataset as spreadsheet in Viewer
str()
# identify data type and structure
nrow()
# identify the number of rows
ncol()
# identify the number of columns
colnames()
# list the name of every column

Manipulate Data

sort()
# sort the values in a column
data.frame()
# structure data into a matrix
subset()
# extract data from a dataframe

Calculating Summary statistics

min()
# identify minimum value
max()
# identify maximum value
median()
# calculate median value
mean()
# calculate mean value

Visualizing Data

hist()
# make a chart with numeric data
plot()
# plot two numeric variables along an x-y axis
abline()
# add a trendline to a plot
table()
# make a table with factor data
prop.table()
# make a table with percentages
barplot()
# make a chart with factor data

dplyr

install.packages("dplyr")
require(dplyr)
tbl_df()
# create a dataframe
filter()
select()
# create a subset; filter for rows, select for columns
mutate()
# add a column
arrange()
# sort rows by category

lubridate

install.packages("lubridate")
require(lubridate)
mdy_hms()
# format timestamp into month, day, year, hour, min and second
# other commands: mdy_hm, mdy, dmy, etc.
hour()
# extract hour from timestamp
# other commands: day, minute, second, etc.

ggplot2

ggplot()
# plot a dataframe
geom_bar()
# make a proportional bar chart
# alternative is geom_col()
# used for factor data 
ggtitle()
# add a title to a plot