Optimizing Bankruptcy Prediction on Imbalanced Data using XGBoost with Random Oversampling and Chi-Square

Authors

  • Revalina Suyatno Universitas Dian Nuswantoro
  • Erika Devi Udayanti Universitas Dian Nuswantoro
  • Ika Novita Dewi Universitas Dian Nuswantoro

DOI:

https://doi.org/10.30871/jaic.v10i1.11841

Keywords:

Bankruptcy Prediction, Data Imbalance, Random Oversampling (ROS), Chi-Square, Extreme Gradient Boosting (XGBoost)

Abstract

In the midst of modern financial dynamics, the ability to predict corporate bankruptcy holds strategic significance, as it directly affects economic stability and investor confidence. However, the development of a reliable predictive model is often hindered by the complex nature of financial data, particularly the class imbalance between bankrupt and non-bankrupt companies. This imbalance causes models to become biased toward the majority class, thereby reducing their sensitivity in detecting bankruptcy cases which are, in fact, the most critical for financial decision-making. This research aims to construct a more balanced and sensitive bankruptcy prediction model by specifically addressing the issue of data imbalance. The proposed approach integrates the Random Oversampling (ROS) technique to equalize class distribution, Chi-Square feature selection to identify the most informative financial variables, and the Extreme Gradient Boosting (XGBoost) algorithm as the core predictive model. The dataset used is the UCI Taiwanese Bankruptcy Prediction dataset, consisting of 6,819 observations and 96 financial ratio variables. Experimental results show that the Chi-Square method successfully identified 20 influential variables, including Per Share Net Profit Before, Debt Ratio, and ROA(B) Before Interest and Depreciation After Tax. The proposed XGBoost model achieved an overall accuracy of 0.9648 and an F1-score of 0.4286, demonstrating superior performance. These findings confirm that the combination of ROS, Chi-Square, and XGBoost effectively enhances data balance and prediction sensitivity for the bankruptcy class. This research is expected to serve as a foundation for developing financial decision-support systems capable of providing early warnings of potential corporate bankruptcy.

Downloads

Download data is not yet available.

References

[1] D. Hafizah, L. Sa, and U. K. A Wahab Hasbullah, “Analisis Komparatif Prediksi Kebangkrutan dengan Metode Altman Z-Score dan Zmijewski X-Score,” Creative Research Management Journal, vol. 7, no. 2, p. 127, Dec. 2024, doi: https://doi.org/10.32663/09awrs60.

[2] Heri Triyono and Nurmala Ahmar, “Meta Analisis Hasil Prediksi Kegagalan Perusahaan dengan Pendekatan 4 Model Prediksi Kebangkrutan,” Journal of Accounting and Finance Management, vol. 5, no. 3, Aug. 2024, doi: https://doi.org/10.38035/jafm.v5i3.623.

[3] R. Nida, “The impact of digital transformation on financial inclusion: Evidence from MSMEs in Indonesia,” Jurnal Perspektif Pembiayaan dan Pembangunan Daerah, vol. 12, no. 4, pp. 2355–8520, Oct. 2024, doi: 10.22437/ppd.v12i4.36399.

[4] D. Saputra, A. M. Yudha, and T. Ulnisa, “Pengaruh Tingkat Suku Bunga, Nilai Tukar Dan Inflasi Terhadap Nilai Perusahaan Dengan Profitabilitas Sebagai Variabel Moderasi Pada Perusahaan Property Dan Real Estate Yang Terdaftar Di Bei 2017-2021,” JAF- Journal of Accounting and Finance, vol. 8, no. 1, p. 54, Mar. 2024, doi: 10.25124/jaf.v8i1.7224.

[5] A. E. Prihatini and D. Purbawati, “Analisis kesehatan Keuangan dengan Menggunakan Metode Altman Z-Score Pada PT Tiga Pilar Sejahtera Food Tbk,” Jurnal Administrasi Bisnis, vol. 10, no. 2, pp. 155–164, Sep. 2021, doi: 10.14710/jab.v10i2.36791.

[6] M. R. Dewi and D. Susilaningrum, “A Hybrid Model to Enhance The Performance of Classifier in Financial Distress Prediction,” Indonesian Journal of Applied Informatics, vol. 9, no. 1, p. 138, Nov. 2024, doi: 10.20961/ijai.v9i1.94725.

[7] R. Rinofah, R. Kusumawardhani, and V. A. Maha Putri, “Factors Affecting Potential Company Bankruptcy During The Covid-19 Pandemic,” Jurnal Keuangan dan Perbankan, vol. 26, no. 1, pp. 208–228, Mar. 2022, doi: 10.26905/jkdp.v26i1.6752.

[8] A. Hartono, W. R. Dita, and I. F. Ulfah, “Analysis of the Altman, Springate, Zmijewski, and Grover Methods in Predicting Bankruptcy in Retail Electronics Sub Sector Companies Listed on the Indonesia Stock Exchange for the 2019-2022 Period,” Ekuilibrium : Jurnal Ilmiah Bidang Ilmu Ekonomi, vol. 20, no. 2, pp. 341–353, Sep. 2025, doi: 10.24269/ekuilibrium.v20i2.2025.pp341-353.

[9] Y. Nurhayati and E. F. Komara, “Predictive Analysis of Financial Distress Using the Altman Z-Score Method on Companies in the Trade, Service & Investment Sector Listed on the Indonesia Stock Exchange in 2019-2023,” Formosa Journal of Applied Sciences, vol. 4, no. 8, pp. 2531–2546, Aug. 2025, doi: 10.55927/fjas.v4i8.299.

[10] A. Kurani, P. Doshi, A. Vakharia, and M. Shah, “A Comprehensive Comparative Study of Artificial Neural Network (ANN) and Support Vector Machines (SVM) on Stock Forecasting,” Annals of Data Science, vol. 10, no. 1, pp. 183–208, Feb. 2023, doi: 10.1007/s40745-021-00344-x.

[11] F. M. Irvan, “Comparative Analysis Of Machine Learning and Deep Learning Models Integrated With Altman Z-Score For Financial Distress Prediction In Companies Listed On The Indonesia Stock Exchange (IDX),” EKOMBIS REVIEW: Jurnal Ilmiah Ekonomi dan Bisnis, vol. 12, no. 2, Apr. 2024, doi: 10.37676/ekombis.v12i2.5478.

[12] R. Saputra, S. Sunardiyo, A. Nugroho, and S. Subiyanto, “Analisis Arsitektur Jaringan Syaraf Tiruan-Multilayer Perceptron untuk Efektivitas Estimasi Beban Energi Listrik PT. PLN (Persero) UP3 Salatiga,” ELKOMIKA: Jurnal Teknik Energi Elektrik, Teknik Telekomunikasi, & Teknik Elektronika, vol. 11, no. 3, p. 664, Jul. 2023, doi: 10.26760/elkomika.v11i3.664.

[13] W. I. Sabilla and C. Bella Vista, “Implementasi SMOTE dan Under Sampling pada Imbalanced Dataset untuk Prediksi Kebangkrutan Perusahaan,” Jurnal Komputer Terapan, vol. 7, no. 2, pp. 329–339, Dec. 2021, doi: 10.35143/jkt.v7i2.5027.

[14] D. J. Maulana, Siti Saadah, and Prasti Eko Yunanto, “Kmeans-SMOTE Integration for Handling Imbalance Data in Classifying Financial Distress Companies using SVM and Naïve Bayes,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 8, no. 1, pp. 54–61, Feb. 2024, doi: 10.29207/resti.v8i1.5140.

[15] D. S. Rahayu, H. Suhartanto, and A. Husodo, “Assessing Data Imbalance in Financial Distress Prediction: A Comparative Approach of Machine Learning and Economic Models,” JOIV : International Journal on Informatics Visualization, vol. 9, no. 5, pp. 1929–1941, Sep. 2025, doi: http://dx.doi.org/10.62527/joiv.9.5.3397.

[16] B. Siswoyo, Z. Abal Abas, A. N. Che Pee, R. Komalasari, and N. Suryana, “Ensemble machine learning algorithm optimization of bankruptcy prediction of bank,” IAES International Journal of Artificial Intelligence (IJ-AI), vol. 11, no. 2, p. 679, Jun. 2022, doi: 10.11591/ijai.v11.i2.pp679-686.

[17] D. Nurmalasari, H. R. Yuliantoro, and D. H. Qudsi, “Improving Panic Disorder Classification Using SMOTE and Random Forest,” Journal of Applied Informatics and Computing, vol. 8, no. 2, pp. 272–279, Oct. 2024, doi: 10.30871/jaic.v8i2.8315.

[18] T. Kurniawan, L. Hermawanti, and A. N. Safriandono, “Interpretable Machine Learning with SHAP and XGBoost for Lung Cancer Prediction Insights,” Journal of Applied Informatics and Computing (JAIC), vol. 8, no. 2, p. 296, Dec. 2024, doi: https://doi.org/10.30871/jaic.v8i2.8395.

[19] I. K. Ananda, A. Z. Fanani, D. Setiawan, and D. F. Wicaksono, “Penerapan Random Oversampling dan Algoritma Boosting untuk Memprediksi Kualitas Buah Jeruk,” Edumatic: Jurnal Pendidikan Informatika, vol. 8, no. 1, pp. 282–289, Jun. 2024, doi: 10.29408/edumatic.v8i1.25836.

[20] A. T. P. Subandono and D. Ariatmanto, “Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square,” SISTEMASI, vol. 14, no. 3, p. 1205, May 2025, doi: 10.32520/stmsi.v14i3.5106.

[21] D. Kurnia, M. Itqan Mazdadi, D. Kartini, R. Adi Nugroho, and F. Abadi, “Seleksi Fitur dengan Particle Swarm Optimization pada Klasifikasi Penyakit Parkinson Menggunakan XGBoost,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 10, no. 5, pp. 1083–1094, Oct. 2023, doi: 10.25126/jtiik.2023107252.

[22] E. Hokijuliandy, H. Napitupulu, and F. Firdaniza, “Analisis Sentimen Menggunakan Metode Klasifikasi Support Vector Machine (SVM) dan Seleksi Fitur Chi-Square,” SisInfo : Jurnal Sistem Informasi dan Informatika, vol. 5, no. 2, pp. 40–49, Aug. 2023, doi: 10.37278/sisinfo.v5i2.670.

[23] E. Mustika Sari, C. Sabila, R. Fakhrizal Adam, and R. Kurniawan, “Analisis dan Prediksi Indeks Kualitas Udara Jakarta: Penerapan Algoritma XGBoost,” Jurnal Nasional Teknologi dan Sistem Informasi, vol. 11, no. 2, pp. 161–169, Sep. 2025, doi: 10.25077/TEKNOSI.v11i2.2025.161-169.

[24] R. E. Ako et al., “Effects of Data Resampling on Predicting Customer Churn via a Comparative Tree-based Random Forest and XGBoost,” Journal of Computing Theories and Applications, vol. 2, no. 1, pp. 86–101, Jun. 2024, doi: 10.62411/jcta.10562.

[25] G. Airlangga, “Comparative Study of XGBoost, Random Forest, and Logistic Regression Models for Predicting Customer Interest in Vehicle Insurance,” sinkron, vol. 8, no. 4, pp. 2542–2549, Oct. 2024, doi: 10.33395/sinkron.v8i4.14194.

[26] R. Andespa, K. Sadik, C. Suhaeni, and A. M. Soleh, “Evaluating Random Forest and XGBoost for Bank Customer Churn Prediction on Imbalanced Data Using SMOTE and SMOTE-ENN,” MEDIA STATISTIKA, vol. 18, no. 1, pp. 25–36, Oct. 2025, doi: 10.14710/medstat.18.1.25-36.

[27] R. F. Brenes, A. Johannssen, and N. Chukhrova, “An intelligent bankruptcy prediction model using a multilayer perceptron,” Intelligent Systems with Applications, vol. 16, Nov. 2022, doi: 10.1016/j.iswa.2022.200136.

[28] F. T. Kristanti, M. Y. Febrianta, D. F. Salim, H. A. Riyadh, and B. A. H. Beshr, “Predicting Financial Distress in Indonesian Companies using Machine Learning,” Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 17644–17649, Dec. 2024, doi: 10.48084/etasr.8520.

[29] “Taiwanese Bankruptcy Prediction,” UCI Machine Learning Repository. Accessed: Oct. 30, 2025. [Online]. Available: https://archive.ics.uci.edu/dataset/572/taiwanese+bankruptcy+prediction

Downloads

Published

2026-02-04

How to Cite

[1]
R. Suyatno, E. D. Udayanti, and I. N. Dewi, “Optimizing Bankruptcy Prediction on Imbalanced Data using XGBoost with Random Oversampling and Chi-Square”, JAIC, vol. 10, no. 1, pp. 365–377, Feb. 2026.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.