Comparative Study of Logistic Regression, Random Forest, and XGBoost for Bank Loan Approval Classification

Authors

  • Hamdika Putra Universitas Amikom Yogyakarta
  • Rumini Rumini Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.30871/jaic.v9i5.10862

Keywords:

Credit Risk, Bank Loan Approval, Logistic Regression, Random Forest, XGBoost

Abstract

Bank loan approval plays a vital role in ensuring financial institutions can minimize credit risk while supporting economic growth. Default prediction is a crucial aspect of banking credit risk management. This study compares three machine learning algorithms Logistic Regression, Random Forest, and Extreme Gradient Boosting (XGBoost) to classify bank loan approvals using a combination of application, previous application, and bureau datasets. The workflow includes data merging, cleaning, missing value imputation, handling unknown values, feature engineering (such as converting day-based variables into years, calculating total submitted documents, income-to-annuity ratio, and employment-to-income ratio), encoding (label and one-hot), scaling (min-max normalization), feature selection based on correlation analysis, handling class imbalance with SMOTE, as well as modeling and evaluation using Accuracy, Precision, Recall, F1-score, and AUC. The results show that Logistic Regression yields the highest AUC of 0.741498, outperforming Random Forest (0.713758) and XGBoost (0.715944). From a business perspective, implementing the best model reduced the Loss Given Default (LGD) by 39.77 %, from $1,705,098,055.50 to $1,026,944,185.50. This finding confirms that simpler models remain competitive on imbalanced datasets when supported by appropriate preprocessing and balancing strategies.

Downloads

Download data is not yet available.

References

[1] Melvin, J., & Soraya, A., “Analisis Perbandingan Algoritma XGBoost dan Algoritma Random Forest Ensemble Learning pada Klasifikasi Keputusan Kredit,” J. Ilmiah, vol. 2, no. 2, 2023.

[2] Pahlevi, O., & Handrianto, Y., “Implementasi Algoritma Klasifikasi Random Forest Untuk Penilaian Kelayakan Kredit,” J. Teknol. Inf., vol. 5, no. 1, pp. 71–76, 2023.

[3] Prameswari, M., Kania, P. E., Ayu, I. G. D., Namira, S., & Harnoko, P., “Penerapan Metode Stacking Ensemble Untuk Klasifikasi Status Pinjaman Nasabah Bank,” Senada, pp. 802–811, 2024.

[4] Pratiwi, A. A., Saraswati, W. T., Ardiansyah, R. F., & Rouf, E. H., “Determining The Loan Feasibility of Bank Customers Using Naïve Bayes, K-Nearest Neighbors And Linear Regression Algorithms,” J. Comput. Appl., vol. 6, pp. 226–236, 2023.

[5] Trisna, K. W., “Model Penerimaan Pinjaman Nasabah Menggunakan Algoritma Naïve Bayes Dalam Dataset Bank,” JBASE - J. Bus. Audit Inf. Syst., vol. 6, no. 1, pp. 1–13, 2023, doi: 10.30813/jbase.v6i1.4309.

[6] Widjiyati, “Implementasi Algoritme Random Forest Pada Klasifikasi Dataset Credit Approval,” J. Janitra Inform. dan Sist. Inf., vol. 1, no. 1, pp. 1–7, 2021.

[7] Zedda, S., “Credit Scoring: Does Xgboost Outperform Logistic Regression? A Test on Italian Smes,” SSRN, 2024. [Online]. Available: https://ssrn.com/abstract=4699098.SSRN

[8] Lou, J., “Comparative Analysis of Logistic Regression, Random Forest, and XGBoost for Loan Approval Prediction,” Atlantis Press, 2024. [Online]. Available: https://www.atlantis-press.com/article/126004036.pdf.Atlantis Press

[9] Hlongwane, R., & Smit, M., “Leveraging Shapley Values for Interpretable Credit Risk Modeling,” Comput. Mater. Continua, vol. 73, no. 3, pp. 4423–4440, 2024. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC11318906/.PMC

[10] Darwish, J. A., “Optimization and Prediction of Corporate Credit Rating Using Machine Learning Algorithms,” Sci. Direct, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1110016825006635.ScienceDirect

[11] Lin, J., “Research on Loan Default Prediction Based on Logistic Regression, Random Forest, XGBoost and AdaBoost,” SHS Web Conf., vol. 181, 2024, Art. no. 02008. [Online]. Available: https://www.shs-conferences.org/articles/shsconf/pdf/2024/01/shsconf_icdeba2023_02008.pdf.shs-conferences.org

[12] Haque, F. M. A., & Hassan, M. M., “Bank Loan Prediction Using Machine Learning Techniques,” arXiv preprint arXiv:2410.08886, 2024. [Online]. Available: https://arxiv.org/pdf/2410.08886.

[13] Yang, S., Huang, Z., Xiao, W., & Shen, X., “Interpretable Credit Default Prediction with Ensemble Learning and SHAP,” arXiv preprint arXiv:2505.20815, 2025. [Online]. Available: https://arxiv.org/abs/2505.20815.arXiv

[14] Arram, A., Ayob, M., Albadr, M. A. A., Sulaiman, A., & Albashish, D., “Credit Card Score Prediction Using Machine Learning Models: A New Dataset,” arXiv preprint arXiv:2310.02956, 2023. [Online]. Available: https://arxiv.org/abs/2310.02956.arXiv

[15] Demir, C., “Traditional Logistic Regression vs. Modern Machine Learning in Credit Scoring: A Practical Overview,” Towards AI, 2025. [Online]. Available: https://towardsai.net/p/machine-learning/traditional-logistic-regression-vs-modern-machine-learning-in-credit-scoring-a-practical-overview.towardsai.net

[16] Biecek, P., Chlebus, M., Gajda, J., Gosiewska, A., Kozak, A., Ogonowski, D., Sztachelski, J., & Wojewnik, P., “Enabling Machine Learning Algorithms for Credit Scoring — Explainable Artificial Intelligence (XAI) Methods for Clear Understanding Complex Predictive Models,” arXiv preprint arXiv:2104.06735, 2021. [Online]. Available: https://arxiv.org/abs/2104.06735.arXiv

[17] Alonso, A., & Carbó, S., “Understanding the Performance of Machine Learning Models in Credit Risk Prediction,” EBA Research Workshop, 2020. [Online].

[18] Zhu, L., “A Study on Predicting Loan Default Based on the Random Forest Algorithm,” Procedia Comput. Sci., vol. 147, pp. 27–32, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1877050919320277.ScienceDirect

[19] Sadewo, M. G., Windarto, A. P., & Hartama, D., “Penerapan Datamining Pada Populasi Daging Ayam RAS Pedaging di Indonesia Berdasarkan Provinsi Menggunakan K-Means Clustering,” InfoTekJar, vol. 1, no. 1, pp. 60–67, 2017.

[20] Nugroho, S., Sulistyo, Y., & Emiliyawati, N., “Sistem Klasifikasi Variabel Tingkat Penerimaan Konsumen Terhadap Mobil Menggunakan Metode Random Forest,” J. Teknik Elektro, vol. 9, no. 1, 2017.

[21] • Friedman, J. H., “Greedy Function Approximation: A Gradient Boosting Machine,” The Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001.

[22] Chen, T., & Guestrin, C., “XGBoost: A Scalable Tree Boosting System,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 785–794.

[23] Sun, Y., Kamel, M. S., Wong, A. K. C., & Wang, Y., “Cost-sensitive Boosting for Classification of Imbalanced Data,” Pattern Recognition, vol. 40, no. 12, pp. 3358–3378, 2007.

[24] Mittal, L., Gupta, T., & Sangaiah, A. K., “Prediction of Credit Risk Evaluation Using Naive Bayes,” The IIOAB Journal, vol. 7, no. 1, pp. 33–42, 2016.

[25] Bawono, B., & Wasono, R., “Perbandingan Metode Random Forest dan Naïve Bayes Untuk Klasifikasi Debitur Berdasarkan Kualitas Kredit,” Seminar Nasional Edusaintek, pp. 343–348, 2019.

[26] Deloitte, AI and Machine Learning in Banking: Risk Management Transformation. Deloitte Insights, 2023. [Online]. Available: https://www2.deloitte.com/insights

[27] World Economic Forum, Global Future Council on AI in Financial Services Report. World Economic Forum, 2024. [Online]. Available: https://www.weforum.org/reports

Downloads

Published

2025-10-19

How to Cite

[1]
H. Putra and R. Rumini, “Comparative Study of Logistic Regression, Random Forest, and XGBoost for Bank Loan Approval Classification”, JAIC, vol. 9, no. 5, pp. 2822–2835, Oct. 2025.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.