Skip to content

Machine learning-based churn prediction using Logistic Regression and SVM, with an in-depth comparative analysis using Python

Notifications You must be signed in to change notification settings

ramzyizza/Bank-Customer-Churn-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

A Comparative Analysis of Logistic Regression and Support Vector Machine for Bank Customer Churn Prediction

This project conducts a comprehensive analysis comparing the effectiveness of Logistic Regression and Support Vector Machines (SVM) in predicting bank customer churn. Utilizing a dataset with over 10,000 records, we apply machine learning techniques to identify patterns and factors influencing customer retention.

Table of Contents

Project Overview

The goal is to predict bank customer churn rates using Logistic Regression and Support Vector Machines (SVM). The analysis focuses on various customer features such as age, credit score, and account balance, determining their influence on the likelihood of a customer churning. By evaluating these models' performance, we aim to identify the most effective approach for predicting customer behavior.

Dataset

The dataset comprises over 10,000 records with features including:

  • Age
  • Gender
  • Credit Score
  • Balance
  • isActiveMember
  • Estimated Salary
  • isChurn

The dataset is provided by Kaggle and can be accessed here.

Methodology

Data Cleaning and Preprocessing

  • Removal of duplicate values and irrelevant features.
  • Handling of missing or null values.
  • Label encoding for categorical variables.
  • Outlier detection and removal.

Exploratory Data Analysis

  • Distribution of customer churn.
  • Churn rates by various demographic segments.
  • Correlation between features and churn rate.

Feature Selection

  • Analysis of feature importance.
  • Identification of key features influencing churn rate.

Model Development and Evaluation

  • Logistic Regression and Support Vector Machine models' implementation.
  • Hyperparameter tuning using Grid Search with Cross-Validation.
  • Performance evaluation based on metrics such as F1-Score, Precision, and Recall.

Results

The analysis revealed that both Logistic Regression and SVM are viable for predicting customer churn, with SVM showing slightly higher accuracy. However, considering real-world application and interpretability, Logistic Regression was deemed more appropriate for this dataset.

Detailed performance metrics are as follows:

  • Logistic Regression: F1 Score = 67.9%, Precision = 69.9%, Recall = 66.1%
  • Support Vector Machine: F1 Score = 69.2%, Precision = 68.0%, Recall = 70.5%

Documentation

About

Machine learning-based churn prediction using Logistic Regression and SVM, with an in-depth comparative analysis using Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published