Comparative Study of Logistic Regression, Random Forest, and XGBoost for Bank Loan Approval Classification
DOI:
https://doi.org/10.30871/jaic.v9i5.10862Keywords:
Credit Risk, Bank Loan Approval, Logistic Regression, Random Forest, XGBoostAbstract
Bank loan approval plays a vital role in ensuring financial institutions can minimize credit risk while supporting economic growth. Default prediction is a crucial aspect of banking credit risk management. This study compares three machine learning algorithms Logistic Regression, Random Forest, and Extreme Gradient Boosting (XGBoost) to classify bank loan approvals using a combination of application, previous application, and bureau datasets. The workflow includes data merging, cleaning, missing value imputation, handling unknown values, feature engineering (such as converting day-based variables into years, calculating total submitted documents, income-to-annuity ratio, and employment-to-income ratio), encoding (label and one-hot), scaling (min-max normalization), feature selection based on correlation analysis, handling class imbalance with SMOTE, as well as modeling and evaluation using Accuracy, Precision, Recall, F1-score, and AUC. The results show that Logistic Regression yields the highest AUC of 0.741498, outperforming Random Forest (0.713758) and XGBoost (0.715944). From a business perspective, implementing the best model reduced the Loss Given Default (LGD) by 39.77 %, from $1,705,098,055.50 to $1,026,944,185.50. This finding confirms that simpler models remain competitive on imbalanced datasets when supported by appropriate preprocessing and balancing strategies.
Downloads
References
[1] Melvin, J., & Soraya, A., “Analisis Perbandingan Algoritma XGBoost dan Algoritma Random Forest Ensemble Learning pada Klasifikasi Keputusan Kredit,” J. Ilmiah, vol. 2, no. 2, 2023.
[2] Pahlevi, O., & Handrianto, Y., “Implementasi Algoritma Klasifikasi Random Forest Untuk Penilaian Kelayakan Kredit,” J. Teknol. Inf., vol. 5, no. 1, pp. 71–76, 2023.
[3] Prameswari, M., Kania, P. E., Ayu, I. G. D., Namira, S., & Harnoko, P., “Penerapan Metode Stacking Ensemble Untuk Klasifikasi Status Pinjaman Nasabah Bank,” Senada, pp. 802–811, 2024.
[4] Pratiwi, A. A., Saraswati, W. T., Ardiansyah, R. F., & Rouf, E. H., “Determining The Loan Feasibility of Bank Customers Using Naïve Bayes, K-Nearest Neighbors And Linear Regression Algorithms,” J. Comput. Appl., vol. 6, pp. 226–236, 2023.
[5] Trisna, K. W., “Model Penerimaan Pinjaman Nasabah Menggunakan Algoritma Naïve Bayes Dalam Dataset Bank,” JBASE - J. Bus. Audit Inf. Syst., vol. 6, no. 1, pp. 1–13, 2023, doi: 10.30813/jbase.v6i1.4309.
[6] Widjiyati, “Implementasi Algoritme Random Forest Pada Klasifikasi Dataset Credit Approval,” J. Janitra Inform. dan Sist. Inf., vol. 1, no. 1, pp. 1–7, 2021.
[7] Zedda, S., “Credit Scoring: Does Xgboost Outperform Logistic Regression? A Test on Italian Smes,” SSRN, 2024. [Online]. Available: https://ssrn.com/abstract=4699098.SSRN
[8] Lou, J., “Comparative Analysis of Logistic Regression, Random Forest, and XGBoost for Loan Approval Prediction,” Atlantis Press, 2024. [Online]. Available: https://www.atlantis-press.com/article/126004036.pdf.Atlantis Press
[9] Hlongwane, R., & Smit, M., “Leveraging Shapley Values for Interpretable Credit Risk Modeling,” Comput. Mater. Continua, vol. 73, no. 3, pp. 4423–4440, 2024. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC11318906/.PMC
[10] Darwish, J. A., “Optimization and Prediction of Corporate Credit Rating Using Machine Learning Algorithms,” Sci. Direct, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1110016825006635.ScienceDirect
[11] Lin, J., “Research on Loan Default Prediction Based on Logistic Regression, Random Forest, XGBoost and AdaBoost,” SHS Web Conf., vol. 181, 2024, Art. no. 02008. [Online]. Available: https://www.shs-conferences.org/articles/shsconf/pdf/2024/01/shsconf_icdeba2023_02008.pdf.shs-conferences.org
[12] Haque, F. M. A., & Hassan, M. M., “Bank Loan Prediction Using Machine Learning Techniques,” arXiv preprint arXiv:2410.08886, 2024. [Online]. Available: https://arxiv.org/pdf/2410.08886.
[13] Yang, S., Huang, Z., Xiao, W., & Shen, X., “Interpretable Credit Default Prediction with Ensemble Learning and SHAP,” arXiv preprint arXiv:2505.20815, 2025. [Online]. Available: https://arxiv.org/abs/2505.20815.arXiv
[14] Arram, A., Ayob, M., Albadr, M. A. A., Sulaiman, A., & Albashish, D., “Credit Card Score Prediction Using Machine Learning Models: A New Dataset,” arXiv preprint arXiv:2310.02956, 2023. [Online]. Available: https://arxiv.org/abs/2310.02956.arXiv
[15] Demir, C., “Traditional Logistic Regression vs. Modern Machine Learning in Credit Scoring: A Practical Overview,” Towards AI, 2025. [Online]. Available: https://towardsai.net/p/machine-learning/traditional-logistic-regression-vs-modern-machine-learning-in-credit-scoring-a-practical-overview.towardsai.net
[16] Biecek, P., Chlebus, M., Gajda, J., Gosiewska, A., Kozak, A., Ogonowski, D., Sztachelski, J., & Wojewnik, P., “Enabling Machine Learning Algorithms for Credit Scoring — Explainable Artificial Intelligence (XAI) Methods for Clear Understanding Complex Predictive Models,” arXiv preprint arXiv:2104.06735, 2021. [Online]. Available: https://arxiv.org/abs/2104.06735.arXiv
[17] Alonso, A., & Carbó, S., “Understanding the Performance of Machine Learning Models in Credit Risk Prediction,” EBA Research Workshop, 2020. [Online].
[18] Zhu, L., “A Study on Predicting Loan Default Based on the Random Forest Algorithm,” Procedia Comput. Sci., vol. 147, pp. 27–32, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1877050919320277.ScienceDirect
[19] Sadewo, M. G., Windarto, A. P., & Hartama, D., “Penerapan Datamining Pada Populasi Daging Ayam RAS Pedaging di Indonesia Berdasarkan Provinsi Menggunakan K-Means Clustering,” InfoTekJar, vol. 1, no. 1, pp. 60–67, 2017.
[20] Nugroho, S., Sulistyo, Y., & Emiliyawati, N., “Sistem Klasifikasi Variabel Tingkat Penerimaan Konsumen Terhadap Mobil Menggunakan Metode Random Forest,” J. Teknik Elektro, vol. 9, no. 1, 2017.
[21] • Friedman, J. H., “Greedy Function Approximation: A Gradient Boosting Machine,” The Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001.
[22] Chen, T., & Guestrin, C., “XGBoost: A Scalable Tree Boosting System,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 785–794.
[23] Sun, Y., Kamel, M. S., Wong, A. K. C., & Wang, Y., “Cost-sensitive Boosting for Classification of Imbalanced Data,” Pattern Recognition, vol. 40, no. 12, pp. 3358–3378, 2007.
[24] Mittal, L., Gupta, T., & Sangaiah, A. K., “Prediction of Credit Risk Evaluation Using Naive Bayes,” The IIOAB Journal, vol. 7, no. 1, pp. 33–42, 2016.
[25] Bawono, B., & Wasono, R., “Perbandingan Metode Random Forest dan Naïve Bayes Untuk Klasifikasi Debitur Berdasarkan Kualitas Kredit,” Seminar Nasional Edusaintek, pp. 343–348, 2019.
[26] Deloitte, AI and Machine Learning in Banking: Risk Management Transformation. Deloitte Insights, 2023. [Online]. Available: https://www2.deloitte.com/insights
[27] World Economic Forum, Global Future Council on AI in Financial Services Report. World Economic Forum, 2024. [Online]. Available: https://www.weforum.org/reports
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Hamdika Putra, Rumini Rumini

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








