This project aims to build and evaluate a machine learning model to automate the loan approval process for a bank, thereby reducing manual processing costs and improving decision accuracy.
The dataset consists of various features related to loan applications, including both numerical and categorical data. The target variable indicates whether a loan was approved (AR) or not approved (No AR).
- Logistic Regression
- Decision Tree
- Random Forest
- Support Vector Machine (SVM)
- XGBoost
- Encoded categorical variables.
- Scaled numerical features.
Evaluated models using:
- Accuracy Score
- Confusion Matrix
- Classification Report
- ROC Curve
- Determined the optimal threshold to minimize the total cost.
- Calculated costs for false positives (FP) and false negatives (FN).
- Selected a threshold of 0.59, resulting in the minimum cost.
- Optimal Threshold: 0.59
- Total Cost with Model: 17,448 EUR
- Manual Processing Cost: 17,613 EUR
- Cost Savings: 165 EUR
- Confusion Matrix: Visual representation of the model's performance.
- Feature Importance: Top 5 most important features influencing the model.
- Cost vs. Threshold Plot: Shows the total cost for different thresholds.
- Business Impact Bar Plot: Comparison of total costs with and without the model.
To implement the model in live decisions:
- Preprocess new loan application data similarly to training data.
- Predict probabilities using the logistic regression model.
- Apply the threshold of 0.59 to make the final decision.
- Automate the decision-making process to reduce manual efforts and costs.