There are 2 parts to this project:
- Clustering
- Regression
The main goal with clustering was to create a plan for PA in which we can get considerably compact districts maintaining good distribution of population across districts. For that, we chose weighted K-means clustering in which we use population to calculate weight which will be used to make the cluster have better population distribution.
We are taking the 2012 vtd grouped by County. We are predicting the proportion of a county that voted D in the 2012 presidential election (67 counties total). For that, we are using following regression techniques:
- Stepwise Regression
- Best Subset Selection Regression
- Multiple Linear Regression
- XGBoost
- Ridge Regression
- LASSO Regression
- Random Forest