📊 Data Mining Demo App

A Streamlit app showcasing various data mining techniques, including data processing, outlier detection, correlation analysis, binning, smoothing, clustering, and prediction. The app serves as a demonstration of applying machine learning algorithms on real-world datasets and performing basic exploratory data analysis (EDA).

Introduction

The Data Mining Demo App provides an interactive platform to explore and apply key data mining techniques. This app is designed to be user-friendly, with a simple interface allowing users to process, analyze, and visualize datasets in various ways.

Key Features:

Data preprocessing and cleaning
Outlier detection using K-Nearest Neighbors (KNN)
Correlation analysis between features
Binning and smoothing techniques for noise reduction
K-Means clustering and machine learning models for prediction

This tool is ideal for learners, data scientists, and anyone interested in performing data mining on small to medium datasets.

Technologies Used

Installation

Follow these steps:

Clone the repository:

# 1. Clone the repository:
git clone https://github.com/LoylP/Data_Mining_App.git
# 2. Navigate to the project directory:
cd Data_Mining_App

You can run with docker:

docker build -t streamlit-app .
docker run -p 8080:8080 streamlit-app

(Optional) Or create and activate a virtual environment:

For Ubuntu/macOS:

python3 -m venv venv
source venv/bin/activate

For Windows:

python -m venv venv
.\venv\Scripts\activate

Install the required dependencies:

pip install -r requirements.txt

Run:

streamlit run app.py

Features

🏠 Home

Overview: Displays the dataset summary, including the shape and the number of missing values for each column.
File Upload: Users can upload a CSV file for analysis, which is then displayed in a tabular format.

🛠 Data Processing

Categorical and Numerical Columns: Automatically detect and classify columns into categorical or numerical types.
Missing Value Handling: Display and provide options to fill or drop missing values.
Encoding Categorical Data: Convert categorical columns to numerical values using techniques like One-Hot Encoding.

🚫 Outlier Detection

KNN-based Outlier Detection: Identify data points that significantly differ from the rest of the dataset.
Outlier Removal: Optionally remove outliers based on a set threshold, improving the quality of analysis.
Distribution Visualization: Visualize the distance distribution to help users set an appropriate threshold for outlier detection.

🔗 Correlation Analysis

Correlation Matrix: Visualize correlations between numerical columns using a heatmap.
Threshold Selection: Choose a minimum correlation threshold to filter out weak correlations, focusing on the most significant relationships.

🌫 Binning/Smoothing

Data Binning: Group continuous data into bins (e.g., age ranges, income groups).

Smoothing: Apply smoothing techniques to reduce noise in binned data, improving the quality of the analysis.

Apriori Algorithm: Perform association rule mining to find frequent itemsets and learn associations between features.

🚀 Clustering

K-Means Clustering: Apply K-Means clustering on numerical features to group similar data points.

Cluster Visualization: Visualize clustering results in 2D or 3D plots for better interpretation.

Cluster Count Selection: Select the number of clusters using the Elbow method or manual input.

📈 EDA (Exploratory Data Analysis)

Descriptive Statistics: Display key statistics like mean, median, standard deviation, and percentiles.
Visualizations: Generate histograms, scatter plots, box plots, and pair plots for deeper insights into the data distribution and relationships.
Feature Correlation: Visualize and analyze the correlation between selected features to understand dependencies.

🔎 Prediction

Model Training: Train machine learning models such as linear regression, decision trees, and random forests.
Model Evaluation: Display performance metrics like accuracy, precision, recall, F1 score, and confusion matrices.

Prediction Input: Allow users to input new data and predict outcomes based on the trained model.

Usage

Upload Data:

Click on the "Upload CSV" button in the sidebar to upload a CSV dataset.
After the file is uploaded, it will be displayed in a table format on the main page.

Data Processing:

Categorization: View and manage categorical and numerical columns.
Missing Value Handling: Choose to either drop or impute missing values.
Encoding: Convert categorical columns to numerical format using one-hot encoding.

Outlier Detection:

Run Outlier Detection: Detect outliers using the KNN-based method.
Adjust Threshold: Set the threshold for outlier removal.
Visualize: See the distribution of average distances and outliers highlighted on the graph.

Clustering and Prediction:

K-Means Clustering: Apply K-Means clustering on selected numerical columns.
Train Model: Train machine learning models and evaluate them using metrics like accuracy and precision.
Prediction: Input new data points to make predictions based on the trained models.

📄 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Notebook/preprocessing		Notebook/preprocessing
data		data
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Data Mining Demo App

Table of Contents

Introduction

Technologies Used

Installation

Follow these steps:

You can run with docker:

(Optional) Or create and activate a virtual environment:

Features

🏠 Home

🛠 Data Processing

🚫 Outlier Detection

🔗 Correlation Analysis

🌫 Binning/Smoothing

🚀 Clustering

📈 EDA (Exploratory Data Analysis)

🔎 Prediction

Usage

Upload Data:

Data Processing:

Outlier Detection:

Clustering and Prediction:

📄 License

About

Releases

Packages

Languages

anhtngc/IS252-ClickThroughRatePrediction

Folders and files

Latest commit

History

Repository files navigation

📊 Data Mining Demo App

Table of Contents

Introduction

Technologies Used

Installation

Follow these steps:

You can run with docker:

(Optional) Or create and activate a virtual environment:

Features

🏠 Home

🛠 Data Processing

🚫 Outlier Detection

🔗 Correlation Analysis

🌫 Binning/Smoothing

🚀 Clustering

📈 EDA (Exploratory Data Analysis)

🔎 Prediction

Usage

Upload Data:

Data Processing:

Outlier Detection:

Clustering and Prediction:

📄 License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages