** Previous Versions: Spring 2020**
** All sessions are remote through Zoom (see canvas for link)**
- All content will be on github in this repo including schedule and tech setup instructions
- All assignments will be on and submitted through canvas
- Class communication and announcements will be primarily through Slack
This is a project-based course designed to provide training and experience in solving real-world problems using machine learning, with a focus on problems from public policy and social good.
Through lectures, discussions, readings, and project assignments, students will learn about and experience building end-to-end machine learning systems, starting from project definition and scoping, to modeling, to field validation and turning their analysis into action. Through the course, students will develop skills in problem formulation, working with messy data, communicating about machine learning with non-technical stakeholders, model interpretability, understanding and mitigating algorithmic bias & disparities, and evaluating the impact of deployed models.
Pre-Requisites: Students will be expected to know Python (for data analysis), and have prior graduate coursework in machine learning. This course assumes that you have taken graduate Machine Learning courses before and is focused on teaching how to use ML to solve real-world problems. Experience with SQL, *nix command line, git(hub), and working on remote machines will be helpful and is highly recommended.
Rayid Ghani | Kit Rodolfa |
---|---|
GHC 8023 Office Hours (Zoom): Tue 12-1pm ET, Fri 3-4pm ET |
GHC 8018 Office Hours (Zoom): Mon 1:30-2:30 ET, Thr 11-12 ET |
Amartya Basu | Aaron Dunmore |
---|---|
Office Hours: (Zoom): Tue 1-2pm ET, Sat 1-2pm ET |
Office Hours (Zoom): Wed 1-2pm ET, Fri 1-2pm ET |
Data Loading Assignment 5.0%
Proposal 10.0%
Peer Reviews 2.5%
Weekly Project Assignments 15.0%
Midterm presentation 7.5%
Final Presentation 10.0%
Final Report and code 20.0%
Quizzes 10.0%
Class Attendance and Participation 15.0%
Weekly feedback forms 5.0%
See the syllabus for much more detail as well, including information about group projects, grading, and helpful optional readings.
Week | Dates | Tuesday | Wednesday | Thursday | Assignments | Project Focus |
1 | Tu: Sep 1 Th: Sep 3 |
Intro/Overview + Project Overviews | Basic Tech Setup: Make sure students can connect to the server through ssh, have access to github, and access the db both from psql and from dbeaver | Scoping, Problem Definition, Balancing goals (equity, efficiency, effectiveness) | 1. Survey (Monday) 2. Project preferences + signature (Wednesday) |
Get familiar with the class, goals, and understand project choices |
2 | Tu: Sep 8 Th: Sep 10 |
Case Studies + Discussion | Git + SQL | Acquiring Data, Privacy, Record Linkage | Understand Project, Data Audit and Exploration | |
3 | Tu: Sep 15 Th: Sep 17 |
Data Exploration + 30 min project team meeting/coordination |
Project Work | Analytical Formulation and Baselines | ACS Data ETL (Friday) | Data Stories and Finalize Project Scope |
4 | Tu: Sep 22 Th: Sep 24 |
Building ML Pipelines | Project Work | Project Work | Project Proposal (Friday) | Initial
ML Pipeline Setup Analytical Formulation and Baselines |
5 | Tu: Sep 29 Th: Oct 1 |
Feature Engineering / Imputation | Remote Tech Workflows | Project Work | Proposal Reviews (Monday) | Iteration
1 - Build End to End Code Pipeline (Focus on end-to-end shell) |
6 | Tu: Oct 6 Th: Oct 8 |
Performance Metrics / Evaluation Ptart 1: Model Selection and Validation | Group Check-Ins | Temporal Validation Deep Dive (with class projects as examples) | Skeleton ML Pipeline Code (Friday) | |
7 | Tu: Oct 13 Th: Oct 15 |
Performance Metrics / Evaluation Pt. II (audition) | Group Check-Ins | Project Work | Iteration
2 - End to End Code Pipeline (Focus on feature development) |
|
8 | Tu: Oct 20 Th: Oct 22 |
Recap of Topics and if time, Overfitting, Leakage, Issues in Deployment | Group Check-Ins | Project Work | Modeling Plan and Feature List (Monday) | |
9 | Tu: Oct 27 Th: Oct 29 |
No Class (Watch Recordings of Mid term Project Progress Presentations and Submit Questions and Feedback) | Group Check-Ins | ML Pipelines and sklearn deep dive | V0 Results, Train Test Splits, Model Selection Metric(s) (Monday) | Iteration
3 - End to End Code Pipeline (Focus on models and evaluation) |
10 | Tu: Nov 3 Th: Nov 5 |
Model Interpretability Part 1: global + postmodeling | Group Check-Ins | Eberly Course Feedback Session and Project Work | Fixed V0 Results, Models and Hyperparameters (Monday) | |
11 | Tu: Nov 10 Th: Nov 12 |
Model Interpretability Part 2: local | Group Check-Ins | Project Work | Weekly Update Assignment (Monday) | Iteration
4 - End to End Code Pipeline (Focus on interpreting the models) |
12 | Tu: Nov 17 Th:Nov 19 |
Bias and Fairness Part I | Group Check-Ins | Project Work | Weekly Update Assignment (Monday) | |
13 | Tu: Nov 24 Th: Thanksgiving |
Bias and Fairness Pt II | HOLIDAY | HOLIDAY | Weekly Update Assignment (Monday) | Final model choice and understanding its performance and impact on disparities |
14 | Tu: Dec 1 Th: Dec 3 |
Causality and Field Validation | Group Check-Ins | Project Work | Weekly Update Assignment (Monday) | Project
Report and Presentations Field Trial Design |
15 | Tu: Dec 8 Th: Dec 10 |
Final Presentations | Final Presentations | Presentations | ||
Final Report Due | Final Report, Code, Repo, Documentation | |||||