This project aims to predict the credit risk of customers using various machine learning algorithms. We will analyze the dataset, preprocess the data, build predictive models, and evaluate their performance.
We have used two datasets for this project:
- FinanKu Data All: Contains customer information and financial data.
- FinanKu Data Validasi: Validation dataset for testing the models.
- We explored the distribution of customers based on location and identified those with unpaid bills.
- Analyzed customer age distribution for both all customers and those with unpaid bills.
- Calculated the average annual and quarterly balances of customers.
- Examined the average product ownership of customers.
We performed data preprocessing and feature engineering:
- Checked for duplicate and missing data.
- Calculated relevant variables such as mean balance and balance change over the observation period.
- Determined customer activity periods.
- Added or subtracted product holdings.
- Calculated the duration of credit card ownership.
We built predictive models using three different algorithms:
- Logistic Regression
- Gradient Boosting (XGBoost)
- Random Forest
Hyperparameters were tuned using GridSearchCV for each algorithm and each experiment.
We evaluated the models using various metrics, including accuracy and recall. The results showed that the models had reasonable accuracy but lower recall, indicating room for improvement.
In this project, we aimed to predict customer credit risk using various machine learning models. While the models showed promise, further improvements are needed to enhance the recall and better identify potential credit risks. Possible future steps include increasing the dataset, oversampling the minority class, expanding the time horizon, trying different feature combinations, exploring more hyperparameter combinations, and experimenting with other machine learning algorithms.