Skip to content

This project focuses on utilizing historical sales data from three different supermarkets to predict the gross income. The dataset contains information on various aspects of sales transactions, such as invoice details, branch locations, etc.

Notifications You must be signed in to change notification settings

adityaravi9034/Supermarket-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Supermarket Data Analysis

This repository contains the code and report for the project on predicting gross income based on supermarket sales data. The project aims to analyze historical sales data from three different supermarkets and derive meaningful insights to facilitate decision-making for the supermarkets.

Objective

The primary goal of this project is to analyze the provided sales dataset and predict the gross income for the three different supermarkets. By answering key questions and extracting valuable insights from the data, this project aims to improve sales performance, identify areas for improvement, and enable evidence-based decision-making.

Table of Contents

  1. Data Preprocessing
    • Imported Libraries
    • Loading Data
    • Exploratory Data Analysis
    • Preliminary Inspection
    • Statistical Test (ANOVA, T-test)
    • Missing Data Handling
    • Handling Categorical Data
    • Prepare Dataset
    • Split Train & Test Dataset
    • Explore Train Dataset
    • Standard Scaler
    • Comparison of Training & Test Data
    • Dimensionality Reduction
    • Feature Scaling
    • Principle Component Analysis (PCA)
    • Min-Max Scaler
    • K-Means Clustering
    • Column Transformer to Encode Categorical Variables
    • Hierarchical Clustering
  2. Models
    • Linear Regression
    • Support Vector Machine
    • Decision Tree Regressor
  3. Cross-validation
    • Cross-validation for Decision Tree Classifier
    • Cross-validation for Linear Regression
    • Cross-validation for Support Vector Machine
    • Cross-validation for Random Forest Regressor
  4. Learning and Validation Curve
    • Linear Regression Learning and Validation Curve
    • Learning Curve for Support Vector Regression
    • Learning Curve for Decision Tree Regressor
  5. Pipeline
    • Adaline
  6. Tuning Hyperparameters
    • Tuning Hyperparameters for Decision Tree Regressor
    • Tuning Hyperparameters for Linear Regression
  7. Voting Regressor
  8. Performing Grid Search Cross-Validation for Best Hyperparameters
  9. Ensemble Learning
    • Evaluating Ensemble Learning Model
    • Evaluating Regression Models
  10. Final Model Visualization
  11. Acknowledgement & Credits

Model Performance

The following are the performance metrics obtained for each model:

  • Linear Regression:

    • MSE: 9.569281319434032e-29
    • RMSE: 0.9659373533136785
    • R2: 0.9937594903570506
    • MAE: 0.7294656364215064
    • MAPE: 17.742231499360862
  • SVM:

    • MSE: 0.9189143505575582
    • RMSE: 0.9586002037124539
    • R2: 0.993853934689648
    • MAE: 0.7199275145830082
    • MAPE: 18.207029972192064
  • Decision Tree:

    • MSE: 0.6720225458333334
    • RMSE: 0.8197698127116743
    • R2: 0.9955052454516412
    • MAE: 0.5926550000000002
    • MAPE: 8.042994835842025
  • Random Forest:

    • MSE: 0.4435065582333338
    • RMSE: 0.6659628805221307
    • R2: 0.9970336514270154
    • MAE: 0.4998605333333334
    • MAPE: 6.903109971305652

Please refer to the project report and code files for a detailed description of the implementation and analysis.

For any questions or inquiries, please contact me on LinkedIn (https://www.linkedin.com/in/aditya-ravi-a3aab11b6/).

Thank you for your attention and interest in our project.

Best regards, Aditya Ravi

Acknowledgement & Credits

We would like to acknowledge NumPy Documentation: https://numpy.org/doc/

Matplotlib Documentation: https://matplotlib.org/stable/contents.html

Seaborn Documentation: https://seaborn.pydata.org/documentation.html

Scipy Documentation: https://docs.scipy.org/doc/

mpl_toolkits.mplot3d Documentation: https://matplotlib.org/stable/mpl_toolkits/mplot3d/index.html

mlxtend Documentation: http://rasbt.github.io/mlxtend/

Scikit-learn Documentation: Dimensionality Reduction (https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition)

Kaggle: Exploratory Data Analysis in Python (https://www.kaggle.com/learn/exploratory-data-analysis)

Feature Scaling: Scikit-learn Documentation: Preprocessing Data (https://scikit-learn.org/stable/modules/preprocessing.html)

K-Means Clustering: Scikit-learn Documentation: K-Means Clustering (https://scikit-learn.org/stable/modules/clustering.html#k-means)

Machine Learning Models: Scikit-learn Documentation: Supervised Learning ( https://scikit learn.org/stable/supervised_learning.html)

Data Preprocessing Pipeline: https://deepnote.com/workspace/university-of-pavia-f27e0737-f8cd-4cef-8454-4e4dbf7199d2/project/Data-preprocessing-for-Machine-Learning-Duplicate-9dd2093c-12cb-4fdb-8923-1b273103d247

for their contributions and support during this project.

About

This project focuses on utilizing historical sales data from three different supermarkets to predict the gross income. The dataset contains information on various aspects of sales transactions, such as invoice details, branch locations, etc.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published