Credit Card Default Prediction: a Systematic Machine Learning Pipeline and Model Comparison

Zhenning Zhang

doi:10.61173/7nfj4473

Authors

Zhenning Zhang Author

DOI:

https://doi.org/10.61173/7nfj4473

Keywords:

Credit card default prediction, Machine learning, Model comparison, Interpretability, AUC

Abstract

Predictions of credit card defaults are inextricably linked with the risk management functions of financial institutions. Traditional statistical science falters with these kinds of problems involving complicated nonlinear structures, while machine learning studies have not performed systematic comparisons among several algorithms backed up by business interpretations. This study created a complete machine-learning pipeline around the University of California, Irvine (UCI) credit card dataset, comparing five models: logistic regression, decision trees, random forests, gradient boosting machines, and multi-layer perceptrons. Hyperparameter tuning was carried out using grid search, while the Area Under the ROC Curve (AUC) was considered the main evaluation metric. The results indicate logistic regression could discriminate best (AUC on test: 0.764), outperforming more complex ensemble and deep learning models. Permutation importance has identified the bill amount four months prior and the credit limit as the most important predictors and shown the greater value of dynamic financial behavior data against static demographic data. The study provides a high-performing model framework that is interpretable and reproducible for the financial industry, challenging the argument that complexity equals performance and showing the practical worth of simple models in certain scenarios.

Credit Card Default Prediction: a Systematic Machine Learning Pipeline and Model Comparison

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section