This repository contains the code and report for the project on predicting gross income based on supermarket sales data. The project aims to analyze historical sales data from three different supermarkets and derive meaningful insights to facilitate decision-making for the supermarkets.
The primary goal of this project is to analyze the provided sales dataset and predict the gross income for the three different supermarkets. By answering key questions and extracting valuable insights from the data, this project aims to improve sales performance, identify areas for improvement, and enable evidence-based decision-making.
- Data Preprocessing
- Imported Libraries
- Loading Data
- Exploratory Data Analysis
- Preliminary Inspection
- Statistical Test (ANOVA, T-test)
- Missing Data Handling
- Handling Categorical Data
- Prepare Dataset
- Split Train & Test Dataset
- Explore Train Dataset
- Standard Scaler
- Comparison of Training & Test Data
- Dimensionality Reduction
- Feature Scaling
- Principle Component Analysis (PCA)
- Min-Max Scaler
- K-Means Clustering
- Column Transformer to Encode Categorical Variables
- Hierarchical Clustering
- Models
- Linear Regression
- Support Vector Machine
- Decision Tree Regressor
- Cross-validation
- Cross-validation for Decision Tree Classifier
- Cross-validation for Linear Regression
- Cross-validation for Support Vector Machine
- Cross-validation for Random Forest Regressor
- Learning and Validation Curve
- Linear Regression Learning and Validation Curve
- Learning Curve for Support Vector Regression
- Learning Curve for Decision Tree Regressor
- Pipeline
- Adaline
- Tuning Hyperparameters
- Tuning Hyperparameters for Decision Tree Regressor
- Tuning Hyperparameters for Linear Regression
- Voting Regressor
- Performing Grid Search Cross-Validation for Best Hyperparameters
- Ensemble Learning
- Evaluating Ensemble Learning Model
- Evaluating Regression Models
- Final Model Visualization
- Acknowledgement & Credits
The following are the performance metrics obtained for each model:
-
Linear Regression:
- MSE: 9.569281319434032e-29
- RMSE: 0.9659373533136785
- R2: 0.9937594903570506
- MAE: 0.7294656364215064
- MAPE: 17.742231499360862
-
SVM:
- MSE: 0.9189143505575582
- RMSE: 0.9586002037124539
- R2: 0.993853934689648
- MAE: 0.7199275145830082
- MAPE: 18.207029972192064
-
Decision Tree:
- MSE: 0.6720225458333334
- RMSE: 0.8197698127116743
- R2: 0.9955052454516412
- MAE: 0.5926550000000002
- MAPE: 8.042994835842025
-
Random Forest:
- MSE: 0.4435065582333338
- RMSE: 0.6659628805221307
- R2: 0.9970336514270154
- MAE: 0.4998605333333334
- MAPE: 6.903109971305652
Please refer to the project report and code files for a detailed description of the implementation and analysis.
For any questions or inquiries, please contact me on LinkedIn (https://www.linkedin.com/in/aditya-ravi-a3aab11b6/).
Thank you for your attention and interest in our project.
Best regards, Aditya Ravi
We would like to acknowledge NumPy Documentation: https://numpy.org/doc/
Matplotlib Documentation: https://matplotlib.org/stable/contents.html
Seaborn Documentation: https://seaborn.pydata.org/documentation.html
Scipy Documentation: https://docs.scipy.org/doc/
mpl_toolkits.mplot3d Documentation: https://matplotlib.org/stable/mpl_toolkits/mplot3d/index.html
mlxtend Documentation: http://rasbt.github.io/mlxtend/
Scikit-learn Documentation: Dimensionality Reduction (https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition)
Kaggle: Exploratory Data Analysis in Python (https://www.kaggle.com/learn/exploratory-data-analysis)
Feature Scaling: Scikit-learn Documentation: Preprocessing Data (https://scikit-learn.org/stable/modules/preprocessing.html)
K-Means Clustering: Scikit-learn Documentation: K-Means Clustering (https://scikit-learn.org/stable/modules/clustering.html#k-means)
Machine Learning Models: Scikit-learn Documentation: Supervised Learning ( https://scikit learn.org/stable/supervised_learning.html)
Data Preprocessing Pipeline: https://deepnote.com/workspace/university-of-pavia-f27e0737-f8cd-4cef-8454-4e4dbf7199d2/project/Data-preprocessing-for-Machine-Learning-Duplicate-9dd2093c-12cb-4fdb-8923-1b273103d247
for their contributions and support during this project.