GitHub - MUSA-620-Spring-2017/Course-Materials: Syllabus: MUSA 620

#UPenn: MUSA 620 - Data Wrangling and Data Visualization

SCHEDULING

Class: Wednesdays from 9am to 12pm in the Levin Building, room 111.

Office hours: Mondays 6pm-8pm and Tuesdays 1pm-3pm. Email galkamaxd at gmail to schedule a time.

OBJECTIVE

The purpose of this course is to familiarize students with the “pipeline” approach to data science. This involves the process of gathering data, storing the data, analyzing the data, and visualizing the data such that non-technical decision makers can make sense of it. The course is broken down accordingly into four sections.

Data collection: Students will learn how to gather data by way of web scraping, APIs, and other unstructured sources.
Databases: This part of the course teaches students how to store this data for efficient retrieval and analysis.
Analytics: Students will learn a range of machine-driven techniques for analyzing structured and unstructured data.
Data visualization: The last part of the course teaches students how to present the results of their analysis visually using R and the web application framework Shiny.

FORMAT

The course will be conducted in weekly sessions devoted to lectures, demonstrations and discussions.

ASSIGNMENTS

There is one required final project at the end of the semester. Homework will be assigned before the close of each class and will be due at the end of the following week’s class. Four of the homework assignments will be explicitly required. The remainder are optional, but will count toward the participation component of your final grade.

For the final project, students will replicate the pipeline approach on a dataset (or datasets) of their choosing. The final deliverable will be a web-based data visualization and accompanying description including a summary of the results and the methods used in each step of the process (collection, storage, analysis and visualization).

Final Project Description

GRADING

The grading breakdown is as follows: 50% for homework; 40% for final project, 10% for participation

SOFTWARE

This course relies on use of the R Statistical Package in conjunction with Shiny and other associated extensions.

SCHEDULE

Class #	Date	Topic	Notes
Week 1	Jan 18	Introduction / Data visualization concepts	Slides
Week 2	Jan 25	Working with Census data	Slides
Week 3	Feb 1	Web scraping with R	Slides
Week 4	Feb 8	Unstructured data: Twitter API	Slides
Week 5	Feb 15	Large datasets: NYC Taxi trip data with Google BigQuery	Slides
Week 6	Feb 22	Spatial databases: PostGIS	Slides
Week 7	Mar 1	Data frames and data manipulation with R: dplyr	Slides
Spring Break
Week 8	Mar 15	Natural language processing	Slides
Week 9	Mar 22	Data visualization with R: ggplot2	Slides
Week 10	Mar 29	Interactive maps with R Leaflet	Slides
Week 11	Apr 5	Shiny 1
Week 12	Apr 12	Shiny 2
Week 13	Apr 19	Shiny 3
Week 14	Apr 26	In-class work on final projects

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

MUSA-620-Spring-2017/Course-Materials

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages