The project aims to predict the popularity of a movie based on it's overview text. It involves a thorough analysis of a movie dataset, exploring various aspects of data preprocessing, model building, training, and evaluation.
This project demonstrates that it is very hard to predict the popularity of a movie based on its overview text.
As you can see from this confusion matrix, the results are not exactly accurate:
- Regression => Target: average_vote
- Classification => Target: poplarity_class
-
๐ Preparation
- Loading the Movies dataset from a GitHub repository.
- Initial exploration of dataset characteristics, including data types, correlations, and handling missing values.
- Transforming the data by dropping unnecessary columns, creating new features, and visualizing distributions.
-
๐ Data Preprocessing
- Feature engineering by converting text data to numerical representations (text vectorization) for the movie overviews and titles.
- Normalization of numeric features for consistent scaling.
- Splitting the dataset into training and test sets.
-
๐๏ธ Model Building and Training
- Construction of various neural network models for both regression (predicting
vote_average
) and classification (predictingpopularity_class_label
). - Models include simple dense networks, LSTM, embedding layers, and convolutional networks, showcasing a range of deep learning techniques.
- Training the models using different features sets: numeric features, overview text, and title text.
- Construction of various neural network models for both regression (predicting
-
๐ Evaluation and Results Analysis
- Evaluation of models using metrics such as R-squared, explained variance for regression, and accuracy, confusion matrix, and classification report for classification.
- Visualization of training and validation accuracy over epochs.
- Comparative analysis of different models based on their performance on the test dataset.
- Comprehensive Data Analysis: Detailed examination and transformation of a complex movie dataset.
- Diverse Model Architectures: Exploration of various neural network structures tailored for specific types of features.
- In-depth Model Evaluation: Extensive analysis of model performance, providing insights into their effectiveness in predictive modeling.
The analysis provides valuable insights into the performance of different types of neural network architectures for both regression and classification tasks. Future work could explore further refinement of models, incorporation of additional features, and application to other datasets or predictive scenarios.