Skip to content

LifeGains/MIDS_projects

Repository files navigation

Kevin Fu: UC Berkeley, MS Data Science - All Projects

  • Repo for all MIDS projects.

W210 Capstone: StockpickAI

  • One of the first ML-based stock prediction models geared towards the Retail Investor. Monthly recommendation of 5 stocks that are most likely to outperform the S&P 500 over the next calendar year.
  • Hal Varian Award Showcase Finalist
  • Results/Model: 55-60% Accuracy: XGBoost / +19.9% annualized vs. +8.9% S&P 500 (2004-2022)
  • Tools: Python, SQL / Amazon AWS (EC2, S3) / XGBoost, Random Forest, K Means Clustering

W261 Machine Learning at Scale: Flight Delay Prediction

  • Predict flight delays to decrease explicit (financial) and implicit (time) costs for both airlines and consumers alike
  • Results/Model: 84% Recall: Random Forest (% of delayed flights that are correctly classified as delayed)
  • Tools: Python / Databricks / Random Forest, Gradient Boosted Trees, Logistic Regression

W266 Natural Language Processing (NLP) with Deep Learning: Subreddit Classification

  • Accurately classify slang-heavy social media posts (using NLP techniques) in order to attract higher ad spending
  • Classification of deslanged social media posts likely to outperform those with slang/acronyms
  • Results/Model: 86% F1 Score: BERT
  • Tools: Python / GCP / Deep Learning (Transformers: BERT & T5, Recurrent Neural Networks), Naive Bayes

W281 Computer Vision: Image Classification

  • Created end-to-end computer vision solution that classifies 20 animal types with unstructured data
  • Combination of low amount of training data and compute resources yielded a low scoring model
  • Results/Model: 32% F1 Score: Support Vector Machines (SVM)
  • Tools: Python / Deep Learning (Convolutional Neural Networks), SVM, PCA, T-SNE, Sobel, Harris Corners

W203 Statistics for Data Science: Impact of Patient Race vs. State Hospital’s Preparedness on COVID Mortality Rates

  • Determine if race or hospital preparedness was more important with respect to COVID mortality rates to help combat “fake news”.
  • Results: State hospital preparedness held higher significance
  • Tools: R / Linear & Logistic Regression

About

Repo for all MIDS projects.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published