Skip to content

Strong-AI-Lab/mlforpublicpolicylab

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

10718, 94889: Data Analysis / Machine Learning for Public Policy Lab

** Previous Versions: Spring 2020**

Fall 2020: Tues & Thurs, 3:20-4:40pm, Project Sections: Wednesday 5:10-6:30pm or 6:40-8pm

** All sessions are remote through Zoom (see canvas for link)**

Important

  • All content will be on github in this repo including schedule and tech setup instructions
  • All assignments will be on and submitted through canvas
  • Class communication and announcements will be primarily through Slack

This is a project-based course designed to provide training and experience in solving real-world problems using machine learning, with a focus on problems from public policy and social good.

Through lectures, discussions, readings, and project assignments, students will learn about and experience building end-to-end machine learning systems, starting from project definition and scoping, to modeling, to field validation and turning their analysis into action. Through the course, students will develop skills in problem formulation, working with messy data, communicating about machine learning with non-technical stakeholders, model interpretability, understanding and mitigating algorithmic bias & disparities, and evaluating the impact of deployed models.

Pre-Requisites: Students will be expected to know Python (for data analysis), and have prior graduate coursework in machine learning. This course assumes that you have taken graduate Machine Learning courses before and is focused on teaching how to use ML to solve real-world problems. Experience with SQL, *nix command line, git(hub), and working on remote machines will be helpful and is highly recommended.

DRAFT SYLLABUS

People

Instructors

Rayid Ghani Kit Rodolfa

GHC 8023
Office Hours (Zoom):
Tue 12-1pm ET, Fri 3-4pm ET

GHC 8018
Office Hours (Zoom):
Mon 1:30-2:30 ET, Thr 11-12 ET

Teaching Assistants

Amartya Basu Aaron Dunmore

Office Hours: (Zoom):
Tue 1-2pm ET, Sat 1-2pm ET

Office Hours (Zoom):
Wed 1-2pm ET, Fri 1-2pm ET

Grading

Data Loading Assignment 5.0%

Proposal 10.0%

Peer Reviews 2.5%

Weekly Project Assignments 15.0%

Midterm presentation 7.5%

Final Presentation 10.0%

Final Report and code 20.0%

Quizzes 10.0%

Class Attendance and Participation 15.0%

Weekly feedback forms 5.0%

Schedule

See the syllabus for much more detail as well, including information about group projects, grading, and helpful optional readings.

Week Dates Tuesday Wednesday Thursday Assignments Project Focus
1 Tu: Sep 1
Th: Sep 3
Intro/Overview + Project Overviews Basic Tech Setup: Make sure students can connect to the server through ssh, have access to github, and access the db both from psql and from dbeaver Scoping, Problem Definition, Balancing goals (equity, efficiency, effectiveness) 1. Survey (Monday)
2. Project preferences + signature (Wednesday)
Get familiar with the class, goals, and understand project choices
2 Tu: Sep 8
Th: Sep 10
Case Studies + Discussion Git + SQL Acquiring Data, Privacy, Record Linkage   Understand Project, Data Audit and Exploration
3 Tu: Sep 15
Th: Sep 17
Data Exploration

+ 30 min project team meeting/coordination
Project Work Analytical Formulation and Baselines ACS Data ETL (Friday) Data Stories and Finalize Project Scope 
4 Tu: Sep 22
Th: Sep 24
Building ML Pipelines Project Work Project Work Project Proposal (Friday) Initial ML Pipeline Setup
Analytical Formulation and Baselines
5 Tu: Sep 29
Th: Oct 1
Feature Engineering / Imputation Remote Tech Workflows Project Work Proposal Reviews (Monday) Iteration 1 - Build End to End Code Pipeline
(Focus on end-to-end shell)
6 Tu: Oct 6
Th: Oct 8
Performance Metrics / Evaluation Ptart 1: Model Selection and Validation Group Check-Ins Temporal Validation Deep Dive (with class projects as examples) Skeleton ML Pipeline Code (Friday)  
7 Tu: Oct 13
Th: Oct 15
Performance Metrics / Evaluation Pt. II (audition) Group Check-Ins Project Work   Iteration 2 - End to End Code Pipeline
(Focus on feature development)
8 Tu: Oct 20
Th: Oct 22
Recap of Topics and if time, Overfitting, Leakage, Issues in Deployment Group Check-Ins Project Work Modeling Plan and Feature List (Monday)  
9 Tu: Oct 27
Th: Oct 29
No Class (Watch Recordings of Mid term Project Progress Presentations and Submit Questions and Feedback) Group Check-Ins ML Pipelines and sklearn deep dive V0 Results, Train Test Splits, Model Selection Metric(s) (Monday) Iteration 3 - End to End Code Pipeline
(Focus on models and evaluation)
10 Tu: Nov 3
Th: Nov 5
Model Interpretability Part 1: global + postmodeling Group Check-Ins Eberly Course Feedback Session and Project Work Fixed V0 Results, Models and Hyperparameters (Monday)  
11 Tu: Nov 10
Th: Nov 12
Model Interpretability Part 2: local Group Check-Ins Project Work Weekly Update Assignment (Monday) Iteration 4 - End to End Code Pipeline
(Focus on interpreting the models)
12 Tu: Nov 17
Th:Nov 19
Bias and Fairness Part I Group Check-Ins Project Work Weekly Update Assignment (Monday)  
13 Tu: Nov 24
Th: Thanksgiving
Bias and Fairness Pt II HOLIDAY HOLIDAY Weekly Update Assignment (Monday) Final model choice and understanding its performance and impact on disparities
14 Tu: Dec 1
Th: Dec 3
Causality and Field Validation Group Check-Ins Project Work Weekly Update Assignment (Monday) Project Report and Presentations
Field Trial Design
15 Tu: Dec 8
Th: Dec 10
Final Presentations   Final Presentations Presentations  
          Final Report Due Final Report, Code, Repo, Documentation

About

Repo for ML for Public Policy Lab course at CMU

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 99.7%
  • Shell 0.3%