Supermarket Data Analysis

This repository contains the code and report for the project on predicting gross income based on supermarket sales data. The project aims to analyze historical sales data from three different supermarkets and derive meaningful insights to facilitate decision-making for the supermarkets.

Objective

The primary goal of this project is to analyze the provided sales dataset and predict the gross income for the three different supermarkets. By answering key questions and extracting valuable insights from the data, this project aims to improve sales performance, identify areas for improvement, and enable evidence-based decision-making.

Data Preprocessing
- Imported Libraries
- Loading Data
- Exploratory Data Analysis
- Preliminary Inspection
- Statistical Test (ANOVA, T-test)
- Missing Data Handling
- Handling Categorical Data
- Prepare Dataset
- Split Train & Test Dataset
- Explore Train Dataset
- Standard Scaler
- Comparison of Training & Test Data
- Dimensionality Reduction
- Feature Scaling
- Principle Component Analysis (PCA)
- Min-Max Scaler
- K-Means Clustering
- Column Transformer to Encode Categorical Variables
- Hierarchical Clustering
Models
- Linear Regression
- Support Vector Machine
- Decision Tree Regressor
Cross-validation
- Cross-validation for Decision Tree Classifier
- Cross-validation for Linear Regression
- Cross-validation for Support Vector Machine
- Cross-validation for Random Forest Regressor
Learning and Validation Curve
- Linear Regression Learning and Validation Curve
- Learning Curve for Support Vector Regression
- Learning Curve for Decision Tree Regressor
Pipeline
- Adaline
Tuning Hyperparameters
- Tuning Hyperparameters for Decision Tree Regressor
- Tuning Hyperparameters for Linear Regression
Voting Regressor
Performing Grid Search Cross-Validation for Best Hyperparameters
Ensemble Learning
- Evaluating Ensemble Learning Model
- Evaluating Regression Models
Final Model Visualization
Acknowledgement & Credits

Model Performance

The following are the performance metrics obtained for each model:

Linear Regression:
- MSE: 9.569281319434032e-29
- RMSE: 0.9659373533136785
- R2: 0.9937594903570506
- MAE: 0.7294656364215064
- MAPE: 17.742231499360862
SVM:
- MSE: 0.9189143505575582
- RMSE: 0.9586002037124539
- R2: 0.993853934689648
- MAE: 0.7199275145830082
- MAPE: 18.207029972192064
Decision Tree:
- MSE: 0.6720225458333334
- RMSE: 0.8197698127116743
- R2: 0.9955052454516412
- MAE: 0.5926550000000002
- MAPE: 8.042994835842025
Random Forest:
- MSE: 0.4435065582333338
- RMSE: 0.6659628805221307
- R2: 0.9970336514270154
- MAE: 0.4998605333333334
- MAPE: 6.903109971305652

Please refer to the project report and code files for a detailed description of the implementation and analysis.

For any questions or inquiries, please contact me on LinkedIn (https://www.linkedin.com/in/aditya-ravi-a3aab11b6/).

Thank you for your attention and interest in our project.

Best regards, Aditya Ravi

Acknowledgement & Credits

We would like to acknowledge NumPy Documentation: https://numpy.org/doc/

Matplotlib Documentation: https://matplotlib.org/stable/contents.html

Seaborn Documentation: https://seaborn.pydata.org/documentation.html

Scipy Documentation: https://docs.scipy.org/doc/

mpl_toolkits.mplot3d Documentation: https://matplotlib.org/stable/mpl_toolkits/mplot3d/index.html

mlxtend Documentation: http://rasbt.github.io/mlxtend/

Scikit-learn Documentation: Dimensionality Reduction (https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition)

Kaggle: Exploratory Data Analysis in Python (https://www.kaggle.com/learn/exploratory-data-analysis)

Feature Scaling: Scikit-learn Documentation: Preprocessing Data (https://scikit-learn.org/stable/modules/preprocessing.html)

K-Means Clustering: Scikit-learn Documentation: K-Means Clustering (https://scikit-learn.org/stable/modules/clustering.html#k-means)

Machine Learning Models: Scikit-learn Documentation: Supervised Learning ( https://scikit learn.org/stable/supervised_learning.html)

Data Preprocessing Pipeline: https://deepnote.com/workspace/university-of-pavia-f27e0737-f8cd-4cef-8454-4e4dbf7199d2/project/Data-preprocessing-for-Machine-Learning-Duplicate-9dd2093c-12cb-4fdb-8923-1b273103d247

for their contributions and support during this project.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
FULL_PROJECT_CODE		FULL_PROJECT_CODE
DATA_ANALYSIS_REPORT.pdf		DATA_ANALYSIS_REPORT.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supermarket Data Analysis

Objective

Table of Contents

Model Performance

Acknowledgement & Credits

About

Releases

Packages

Languages

adityaravi9034/Supermarket-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

Supermarket Data Analysis

Objective

Table of Contents

Model Performance

Acknowledgement & Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages