A Comparative Analysis of Logistic Regression and Support Vector Machine for Bank Customer Churn Prediction
This project conducts a comprehensive analysis comparing the effectiveness of Logistic Regression and Support Vector Machines (SVM) in predicting bank customer churn. Utilizing a dataset with over 10,000 records, we apply machine learning techniques to identify patterns and factors influencing customer retention.
The goal is to predict bank customer churn rates using Logistic Regression and Support Vector Machines (SVM). The analysis focuses on various customer features such as age, credit score, and account balance, determining their influence on the likelihood of a customer churning. By evaluating these models' performance, we aim to identify the most effective approach for predicting customer behavior.
The dataset comprises over 10,000 records with features including:
- Age
- Gender
- Credit Score
- Balance
- isActiveMember
- Estimated Salary
- isChurn
The dataset is provided by Kaggle and can be accessed here.
- Removal of duplicate values and irrelevant features.
- Handling of missing or null values.
- Label encoding for categorical variables.
- Outlier detection and removal.
- Distribution of customer churn.
- Churn rates by various demographic segments.
- Correlation between features and churn rate.
- Analysis of feature importance.
- Identification of key features influencing churn rate.
- Logistic Regression and Support Vector Machine models' implementation.
- Hyperparameter tuning using Grid Search with Cross-Validation.
- Performance evaluation based on metrics such as F1-Score, Precision, and Recall.
The analysis revealed that both Logistic Regression and SVM are viable for predicting customer churn, with SVM showing slightly higher accuracy. However, considering real-world application and interpretability, Logistic Regression was deemed more appropriate for this dataset.
Detailed performance metrics are as follows:
- Logistic Regression: F1 Score = 67.9%, Precision = 69.9%, Recall = 66.1%
- Support Vector Machine: F1 Score = 69.2%, Precision = 68.0%, Recall = 70.5%