Skip to content

Latest commit

 

History

History
64 lines (55 loc) · 3.17 KB

File metadata and controls

64 lines (55 loc) · 3.17 KB

📊 Credit Card Fraud Detection Project

🎯 Objective

  • The objective of this project is to develop a machine learning model that can predict whether a credit card transaction is fraudulent or not. 🕵️‍♂️💳

⚙️ Functionality

  • Data analysis and cleaning 🧹
  • Data exploration and visualization 🔍📊
  • Training multiple machine learning models 🤖
  • Evaluating and comparing models 📈
  • Visualizing results and metrics 📉

🛠️ Tools Used

  • Python 🐍
  • Pandas for data manipulation 🐼
  • NumPy for numerical operations 🔢
  • Matplotlib and Seaborn for data visualization 📊🎨
  • Scikit-learn for machine learning modeling and evaluation 🤖

🛤️ Development Process

  1. Data Analysis and Cleaning 🧹
  • Data Loading: Import the credit card transactions dataset 📥
  • Basic Analysis: Explore the structure and basic statistics of the dataset 📊
  • Duplicate Check: Identify and remove duplicate rows 🗑️
  • Data Cleaning: Check for and handle missing values 🚫
  1. Exploratory Data Analysis (EDA) 🔍
  • Target Variable Distribution: Visualize the distribution of fraudulent and non-fraudulent transactions 📊
  • Correlation Matrix: Analyze the correlation between variables 🔗
  • Feature Distribution: Visualize the distribution of each feature 📈
  1. Data Preprocessing 🧪
  • Feature and Target Separation: Split the dataset into features (X) and target variable (y) ✂️
  • Dataset Splitting: Divide the data into training and testing sets 🧩
  • Feature Scaling: Normalize the features to improve model performance 📏
  1. Machine Learning Modeling 🤖
  • Logistic Regression: Train and evaluate a logistic regression model 📉
  • Random Forest: Train and evaluate a Random Forest model 🌳
  • Support Vector Machine (SVM): Train and evaluate an SVM model 🧠
  1. Model Evaluation and Comparison 📈
  • ROC Curve and AUC: Compare models using the ROC curve and area under the curve (AUC) 📊
  • Precision-Recall Curve: Evaluate the precision and recall of the models 📉
  • Additional Metrics: Calculate precision, recall, F1-score, and accuracy for each model 📏

📈 Results

  • Logistic Regression: Good performance in precision but lower recall 📉
  • Random Forest: Best balance between precision and recall 🌳
  • SVM: High precision but lower recall compared to Random Forest 🧠

📊 Visualizations

  • Target Variable Distribution: Bar chart showing the distribution of fraudulent and non-fraudulent transactions 📊
  • Correlation Matrix: Heatmap showing the correlation between variables 🔗
  • Feature Distribution: Histograms showing the distribution of each feature 📈
  • ROC Curve: Plot comparing the ROC curves of the models 📉
  • Precision-Recall Curve: Plot comparing the precision and recall of the models 📊

🗂️ Project Structure

  • Notebook

📝 Conclusions

  • Random Forest is the most balanced model for detecting credit card fraud 🌳
  • Feature standardization and duplicate removal are crucial steps in data preprocessing 🧹
  • Evaluating multiple metrics is essential for a comprehensive model comparison 📏

📬 Contact

  • For any inquiries or collaborations, you can contact me at: jotaduranbon.com 📧