MACHINE LEARNING-BASED CREDIT RISK MODELING IN RETAIL BANKING

Rohan Malhotra; Kavya Menon; Sandeep Patil

Authors

Rohan Malhotra
Kavya Menon
Sandeep Patil

Keywords:

Credit Risk Modeling, Machine Learning, Retail Banking, Gradient Boosting, AUC-ROC

Abstract

Machine learning applications are becoming more and more important in the improvement of credit risk assessment inthe field of retail banking. In detecting the default of the borrowers, the paper will compare the predictive power of differentclassification models using structured borrower level data, which will be made up of borrower characteristics, variablesof credit history and facts of the loan. The applied models were Decision Tree, Random Forest and Gradient Boostingmodels that were implemented and evaluated in a single validation framework. The following process made up the datapreprocessing tasks: the process of missing values, replacement of categorical variables, elimination of post-loanperformance variables to prevent the leakage of data andthe problem of the use of class imbalance to ensure that soundestimation was obtained. Primary performance measure that was adopted to test the performance of the model was theArea Under the Receiver Operating Characteristic Curve (AUC-ROC) that was supported by accuracy, precision, recall,and F1-score. These results show that the ensemble-based methods are more effective than the single-tree methods withGradient Boosting that has the greatest ability to discriminate followed closely by the Random Forest. The importance offeatures analysis has indicated that the most important predictors of the risk associated with the default of the borrowerare the interest rate, credit utilization, and debt-to-income ratio. The findings confirm the argument that group learning approaches can be very useful in improving classification and predictive stability of retail credit risk modeling.

Downloads

Download data is not yet available.

References

Abellán, J., & Castellano, J. G. (2017). A comparative study on base classifiers in ensemble methods for credit scoring. Expert Systems with Applications, 73, 1–10. https://doi.org/10.1016/j.eswa.2016.12.020

AghaeiRad, A., Chen, N., & Ribeiro, B. (2017). Improve credit scoring using transfer of learned knowledge from self-organizing map. Neural Computing and Applications, 28(6), 1329–1342. https://doi.org/10.1007/s00521-016-2567-2

Ariza-Garzón, M. J., Arroyo, J., Caparrini, A., & Segovia-Vargas, M. J. (2020). Explainability of a machine learning granting scoring model in peer-to-peer lending. IEEE Access, 8, 64873–64890. https://doi.org/10.1109/ACCESS.2020.2984412

Bao, W., Lianju, N., & Yue, K. (2019). Integration of unsupervised and supervised machine learning algorithms for credit risk assessment. Expert Systems with Applications, 128, 301–315. https://doi.org/10.1016/j.eswa.2019.03.033

Bussmann, N., Giudici, P., Marinelli, D., & Papenbrock, J. (2021). Explainable machine learning in credit risk management. Computational Economics, 57(1), 203–216. https://doi.org/10.1007/s10614-020-10042-0

Chang, Y. C., Chang, K. H., & Wu, G. J. (2018). Application of eXtreme gradient boosting trees in the construction of credit risk assessment models for financial institutions. Applied Soft Computing, 73, 914–920. https://doi.org/10.1016/j.asoc.2018.09.029

García, V., Marques, A. I., & Sánchez, J. S. (2019). Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Information Fusion, 47, 88–101. https://doi.org/10.1016/j.inffus.2018.06.004

Gramegna, A., & Giudici, P. (2021). SHAP and LIME: An evaluation of discriminative power in credit risk. Frontiers in Artificial Intelligence, 4, 752558. https://doi.org/10.3389/frai.2021.752558

Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136. https://doi.org/10.1016/j.ejor.2015.05.030

Misheva, B. H., Osterrieder, J., Hirsa, A., Kulkarni, O., & Lin, S. F. (2021). Explainable AI in credit risk management. arXiv Preprint arXiv:2103.00949. https://doi.org/10.48550/arXiv.2103.00949

Moscato, V., Picariello, A., & Sperlí, G. (2021). A benchmark of machine learning approaches for credit score prediction. Expert Systems with Applications, 165, 113986. https://doi.org/10.1016/j.eswa.2020.113986

Mushava, J., & Murray, M. (2022). A novel XGBoost extension for credit scoring class-imbalanced data combining a generalized extreme value link and a modified focal loss function. Expert Systems with Applications, 202, 117233. https://doi.org/10.1016/j.eswa.2022.117233

Petropoulos, A., Siakoulis, V., Stavroulakis, E., & Klamargias, A. (2019). A robust machine learning approach for credit risk analysis of large loan level datasets using deep learning and extreme gradient boosting. IFC Bulletins Chapters, 49.

Shi, S., Tse, R., Luo, W., D’Addona, S., & Pau, G. (2022). Machine learning-driven credit risk: A systemic review. Neural Computing and Applications, 34(17), 14327–14339. https://doi.org/10.1007/s00521-022-07472-2

Singh, U. (2023). Lending Club Loan Dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/utkarshx27/lending-club-loan-dataset

Xia, Y., Liu, C., Li, Y., & Liu, N. (2017). A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Systems with Applications, 78, 225–241. https://doi.org/10.1016/j.eswa.2017.02.017

MACHINE LEARNING-BASED CREDIT RISK MODELING IN RETAIL BANKING

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

sidemenu

Quick Links

Submission Info

Manuscript Submission

crossref