Project Structure:
- Collect Relevant Data
- Which data is important?
- Recruit Data - high school recruiting data
- Pro Data - draft data/players in the league
- College Data - playing time, all-conference teams
- Which data is important?
- Perform Analysis
- Detail Findings
- Recommendations for Future Analysis
Study to analyze Vanderbilt recruiting performance over time.
Background
Data
Models
Timeline
Repo Structure
Logistics
Resources
Contact
The goal of this project is to answer the following questions:
- Where has Vanderbilt performed well/poor historically?
- Where are potential areas to target based on this data?
Provide a broad overview of the purpose of the project.
Describe the data - what kind of data is it? Describe the general format, and potential quirks.
Describe the overall size of the dataset and the relative ratio of positive/negative examples for each of the response variables.
Clearly identify each of the response variables of interest. Any additional desired analysis should also be described here.
Outline the desired timeline of the project and any explicit deadlines.
Give a description of how the repository is structured. Example structure description below:
The repo is structured as follows: Notebooks are grouped according to their series (e.g., 10, 20, 30, etc) which reflects the general task to be performed in those notebooks. Start with the *0 notebook in the series and add other investigations relevant to the task in the series (e.g., 11-cleaned-scraped.ipynb
). If your notebook is extremely long, make sure you've utilized nbdev reuse capabilities and consider whether you can divide the notebook into two notebooks.
All files which appear in the repo should be able to run, and not contain error or blank cell lines, even if they are relatively midway in development of the proposed task. All notebooks relating to the analysis should have a numerical prefix (e.g., 31-) followed by the exploration (e.g. 31-text-labeling). Any utility notebooks should not be numbered, but be named according to their purpose. All notebooks should have lowercase and hyphenated titles (e.g., 10-process-data not 10-Process-Data). All notebooks should adhere to literate programming practices (i.e., markdown writing to describe problems, assumptions, conclusions) and provide adequate although not superfluous code comments.
Sprint planning:
Demo:
Data location:
Slack channel:
Zoom link:
- Python usage: Whirlwind Tour of Python, Jake VanderPlas (Book, Notebooks)
- Data science packages in Python: Python Data Science Handbook, Jake VanderPlas
- HuggingFace: Website, Course/Training, Inference using pipelines, Fine tuning models
- fast.ai: Course, Quick start
- h2o: Resources, documentation, and API links
- nbdev: Overview, Tutorial
- Git tutorials: Simple Guide, Learn Git Branching
- ACCRE how-to guides: DSI How-tos
Logan King, Recruiting Assistant - Vanderbilt Football, logan.a.king@vanderbilt.edu