The goal of this project is to analyze and predict what YouTube videos are trending from a dataset scraped from YouTube’s Trending Page in order to find common characteristics of trending videos and attempt to predict and reproduce results.
David Dobrik, one of these trending content creators, for example started his YouTube channel in 2015 and built it to be the fifth-most viewed creator channel on YouTube in 2019 and producing an annual income of $13 million dollars.
This information could be valuable to individual content creators who are interested in increasing their view rates, or for corporate clients or individuals who are interested in using YouTube as an avenue for content creation.
Our data comes from a kaggle project in the following link:
https://www.kaggle.com/datasnaek/youtube-new
- Data Wrangling and EDA to get an understanding of our data
- Feature engineering if needed along the way
- Visualizing our data
- Applying NLP on video descriptions to find most viewed topics and videos
- Cross validating our model