📊 Credit Card Fraud Detection Project

🎯 Objective

The objective of this project is to develop a machine learning model that can predict whether a credit card transaction is fraudulent or not. 🕵️‍♂️💳

⚙️ Functionality

Data analysis and cleaning 🧹
Data exploration and visualization 🔍📊
Training multiple machine learning models 🤖
Evaluating and comparing models 📈
Visualizing results and metrics 📉

🛠️ Tools Used

Python 🐍
Pandas for data manipulation 🐼
NumPy for numerical operations 🔢
Matplotlib and Seaborn for data visualization 📊🎨
Scikit-learn for machine learning modeling and evaluation 🤖

🛤️ Development Process

Data Analysis and Cleaning 🧹

Data Loading: Import the credit card transactions dataset 📥
Basic Analysis: Explore the structure and basic statistics of the dataset 📊
Duplicate Check: Identify and remove duplicate rows 🗑️
Data Cleaning: Check for and handle missing values 🚫

Exploratory Data Analysis (EDA) 🔍

Target Variable Distribution: Visualize the distribution of fraudulent and non-fraudulent transactions 📊
Correlation Matrix: Analyze the correlation between variables 🔗
Feature Distribution: Visualize the distribution of each feature 📈

Data Preprocessing 🧪

Feature and Target Separation: Split the dataset into features (X) and target variable (y) ✂️
Dataset Splitting: Divide the data into training and testing sets 🧩
Feature Scaling: Normalize the features to improve model performance 📏

Machine Learning Modeling 🤖

Logistic Regression: Train and evaluate a logistic regression model 📉
Random Forest: Train and evaluate a Random Forest model 🌳
Support Vector Machine (SVM): Train and evaluate an SVM model 🧠

Model Evaluation and Comparison 📈

ROC Curve and AUC: Compare models using the ROC curve and area under the curve (AUC) 📊
Precision-Recall Curve: Evaluate the precision and recall of the models 📉
Additional Metrics: Calculate precision, recall, F1-score, and accuracy for each model 📏

📈 Results

Logistic Regression: Good performance in precision but lower recall 📉
Random Forest: Best balance between precision and recall 🌳
SVM: High precision but lower recall compared to Random Forest 🧠

📊 Visualizations

Target Variable Distribution: Bar chart showing the distribution of fraudulent and non-fraudulent transactions 📊
Correlation Matrix: Heatmap showing the correlation between variables 🔗
Feature Distribution: Histograms showing the distribution of each feature 📈
ROC Curve: Plot comparing the ROC curves of the models 📉
Precision-Recall Curve: Plot comparing the precision and recall of the models 📊

🗂️ Project Structure

Notebook

📝 Conclusions

Random Forest is the most balanced model for detecting credit card fraud 🌳
Feature standardization and duplicate removal are crucial steps in data preprocessing 🧹
Evaluating multiple metrics is essential for a comprehensive model comparison 📏

📬 Contact

For any inquiries or collaborations, you can contact me at: jotaduranbon.com 📧