This project provides an end-to-end data analysis solution for Walmart sales data. Using Python and SQL, it extracts critical business insights by solving key analytical problems. This repository is ideal for data analysts aiming to enhance their skills in data manipulation, SQL querying, and exploratory data analysis.
- Overview
- Project Workflow
- Key Features
- Requirements
- Usage Instructions
- Dataset
- SQL Queries
- Results and Insights
-
Data Acquisition
- Download Walmart sales data using the Kaggle API.
- Place the dataset in the
data/
directory for processing.
-
Data Preparation
- Load raw data into a Pandas DataFrame for exploration.
- Clean and preprocess the data by:
- Removing duplicates.
- Handling missing values.
- Formatting currency and ensuring consistent data types.
-
Feature Engineering
- Add calculated fields like total transaction amount (
unit_price * quantity
) for streamlined SQL analysis.
- Add calculated fields like total transaction amount (
-
Data Loading
- Load the cleaned data into MySQL and PostgreSQL databases using Python's
sqlalchemy
.
- Load the cleaned data into MySQL and PostgreSQL databases using Python's
-
SQL Querying
- Perform complex analyses to solve business problems, including:
- Revenue trends by branch and category.
- Identifying high- and low-performing products and branches.
- Analyzing peak sales times and preferred payment methods.
- Perform complex analyses to solve business problems, including:
-
Visualization and Reporting
- Optionally, use visualization tools (e.g., Matplotlib or Tableau) to present findings.
- Comprehensive Data Cleaning: Ensures high data quality by handling missing values, duplicates, and inconsistent formats.
- SQL-Powered Analysis: Includes advanced queries to extract actionable insights.
- Database Integration: Utilizes MySQL and PostgreSQL for data storage and querying.
- Automated Workflow: Python scripts automate data processing and database operations.
- Python 3.8+
- Libraries:
- pandas
- numpy
- sqlalchemy
- mysql-connector-python
- psycopg2
- SQL Databases:
- MySQL
- PostgreSQL
- Kaggle API Key (for dataset download)
- Clone the Repository
git clone <repo-url> cd <repo-directory>
To download datasets from Kaggle, you need to set up the Kaggle API. Follow these steps:
Place your kaggle.json file in the .kaggle/ folder. If you don't have a kaggle.json file, you can create one by following the instructions on the Kaggle API documentation.
Download the Walmart dataset using the Kaggle API:
kaggle datasets download -d Replace with the actual path to the dataset you want to download.
Run the Notebook Open the project.ipynb notebook in Jupyter or VS Code.
Follow the sequential steps outlined in the notebook for data cleaning, feature engineering, and SQL integration.
Analyze with SQL Use the provided SQL scripts to extract insights directly from the database. Ensure your database is set up and connected correctly.
This project uses transactional data from Walmart to derive insights into sales, profitability, and customer behavior. The dataset includes the following key columns:
- branch: Identifier for branches
- city: Location of branches
- payment_method: Mode of payment
- unit_price: Price per unit
- quantity: Number of units sold
- rating: Customer ratings
- profit_margin: Profit margin on products
- date: Transaction date
- time: Transaction time
- category: Product category
- total: Total transaction amount
- Fetch all data
- Count total transactions
- Count distinct payment methods and transactions per method
- Calculate the total quantity sold by payment method
- Identify low-performing branches
- Find the highest-rated categories per branch
- Determine low-performing categories
- Analyze busiest day per branch
- Categorize sales into Morning, Afternoon, and Evening shifts
- Compare branch revenues for 2022 and 2023, highlighting revenue declines
Key insights derived from the project include:
- Sales Insights: Identification of key product categories, branches with the highest sales, and preferred payment methods.
- Profitability: Insights into the most profitable product categories and branch locations.
- Customer Behavior: Trends in customer ratings, payment preferences, and peak shopping hours.
- Branch Comparisons: Revenue trends and performance evaluations over time, including a comparison of branch revenues for 2022 and 2023.
This analysis provides valuable insights into Walmart's transactional data, helping to understand sales performance, customer preferences, and profitability across different branches and product categories. These insights can inform strategic decisions to enhance overall business performance.