- Repo for all MIDS projects.
- One of the first ML-based stock prediction models geared towards the Retail Investor. Monthly recommendation of 5 stocks that are most likely to outperform the S&P 500 over the next calendar year.
- Hal Varian Award Showcase Finalist
- Results/Model: 55-60% Accuracy: XGBoost / +19.9% annualized vs. +8.9% S&P 500 (2004-2022)
- Tools: Python, SQL / Amazon AWS (EC2, S3) / XGBoost, Random Forest, K Means Clustering
- Predict flight delays to decrease explicit (financial) and implicit (time) costs for both airlines and consumers alike
- Results/Model: 84% Recall: Random Forest (% of delayed flights that are correctly classified as delayed)
- Tools: Python / Databricks / Random Forest, Gradient Boosted Trees, Logistic Regression
- Accurately classify slang-heavy social media posts (using NLP techniques) in order to attract higher ad spending
- Classification of deslanged social media posts likely to outperform those with slang/acronyms
- Results/Model: 86% F1 Score: BERT
- Tools: Python / GCP / Deep Learning (Transformers: BERT & T5, Recurrent Neural Networks), Naive Bayes
- Created end-to-end computer vision solution that classifies 20 animal types with unstructured data
- Combination of low amount of training data and compute resources yielded a low scoring model
- Results/Model: 32% F1 Score: Support Vector Machines (SVM)
- Tools: Python / Deep Learning (Convolutional Neural Networks), SVM, PCA, T-SNE, Sobel, Harris Corners
W203 Statistics for Data Science: Impact of Patient Race vs. State Hospital’s Preparedness on COVID Mortality Rates
- Determine if race or hospital preparedness was more important with respect to COVID mortality rates to help combat “fake news”.
- Results: State hospital preparedness held higher significance
- Tools: R / Linear & Logistic Regression