MACHINE LEARNING-BASED CREDIT RISK MODELING IN RETAIL BANKING
Keywords:
Credit Risk Modeling, Machine Learning, Retail Banking, Gradient Boosting, AUC-ROCAbstract
Machine learning applications are becoming more and more important in the improvement of credit risk assessment inthe field of retail banking. In detecting the default of the borrowers, the paper will compare the predictive power of differentclassification models using structured borrower level data, which will be made up of borrower characteristics, variablesof credit history and facts of the loan. The applied models were Decision Tree, Random Forest and Gradient Boostingmodels that were implemented and evaluated in a single validation framework. The following process made up the datapreprocessing tasks: the process of missing values, replacement of categorical variables, elimination of post-loanperformance variables to prevent the leakage of data andthe problem of the use of class imbalance to ensure that soundestimation was obtained. Primary performance measure that was adopted to test the performance of the model was theArea Under the Receiver Operating Characteristic Curve (AUC-ROC) that was supported by accuracy, precision, recall,and F1-score. These results show that the ensemble-based methods are more effective than the single-tree methods withGradient Boosting that has the greatest ability to discriminate followed closely by the Random Forest. The importance offeatures analysis has indicated that the most important predictors of the risk associated with the default of the borrowerare the interest rate, credit utilization, and debt-to-income ratio. The findings confirm the argument that group learning approaches can be very useful in improving classification and predictive stability of retail credit risk modeling.
Downloads
References
Abellán, J., & Castellano, J. G. (2017). A comparative study on base classifiers in ensemble methods for credit scoring. Expert Systems with Applications, 73, 1–10. https://doi.org/10.1016/j.eswa.2016.12.020
AghaeiRad, A., Chen, N., & Ribeiro, B. (2017). Improve credit scoring using transfer of learned knowledge from self-organizing map. Neural Computing and Applications, 28(6), 1329–1342. https://doi.org/10.1007/s00521-016-2567-2
Ariza-Garzón, M. J., Arroyo, J., Caparrini, A., & Segovia-Vargas, M. J. (2020). Explainability of a machine learning granting scoring model in peer-to-peer lending. IEEE Access, 8, 64873–64890. https://doi.org/10.1109/ACCESS.2020.2984412
Bao, W., Lianju, N., & Yue, K. (2019). Integration of unsupervised and supervised machine learning algorithms for credit risk assessment. Expert Systems with Applications, 128, 301–315. https://doi.org/10.1016/j.eswa.2019.03.033
Bussmann, N., Giudici, P., Marinelli, D., & Papenbrock, J. (2021). Explainable machine learning in credit risk management. Computational Economics, 57(1), 203–216. https://doi.org/10.1007/s10614-020-10042-0
Chang, Y. C., Chang, K. H., & Wu, G. J. (2018). Application of eXtreme gradient boosting trees in the construction of credit risk assessment models for financial institutions. Applied Soft Computing, 73, 914–920. https://doi.org/10.1016/j.asoc.2018.09.029
García, V., Marques, A. I., & Sánchez, J. S. (2019). Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Information Fusion, 47, 88–101. https://doi.org/10.1016/j.inffus.2018.06.004
Gramegna, A., & Giudici, P. (2021). SHAP and LIME: An evaluation of discriminative power in credit risk. Frontiers in Artificial Intelligence, 4, 752558. https://doi.org/10.3389/frai.2021.752558
Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136. https://doi.org/10.1016/j.ejor.2015.05.030
Misheva, B. H., Osterrieder, J., Hirsa, A., Kulkarni, O., & Lin, S. F. (2021). Explainable AI in credit risk management. arXiv Preprint arXiv:2103.00949. https://doi.org/10.48550/arXiv.2103.00949
Moscato, V., Picariello, A., & Sperlí, G. (2021). A benchmark of machine learning approaches for credit score prediction. Expert Systems with Applications, 165, 113986. https://doi.org/10.1016/j.eswa.2020.113986
Mushava, J., & Murray, M. (2022). A novel XGBoost extension for credit scoring class-imbalanced data combining a generalized extreme value link and a modified focal loss function. Expert Systems with Applications, 202, 117233. https://doi.org/10.1016/j.eswa.2022.117233
Petropoulos, A., Siakoulis, V., Stavroulakis, E., & Klamargias, A. (2019). A robust machine learning approach for credit risk analysis of large loan level datasets using deep learning and extreme gradient boosting. IFC Bulletins Chapters, 49.
Shi, S., Tse, R., Luo, W., D’Addona, S., & Pau, G. (2022). Machine learning-driven credit risk: A systemic review. Neural Computing and Applications, 34(17), 14327–14339. https://doi.org/10.1007/s00521-022-07472-2
Singh, U. (2023). Lending Club Loan Dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/utkarshx27/lending-club-loan-dataset
Xia, Y., Liu, C., Li, Y., & Liu, N. (2017). A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Systems with Applications, 78, 225–241. https://doi.org/10.1016/j.eswa.2017.02.017


