Optimizing XGBoost for Heart Disease Risk Classification Using Optuna and Random Search on the Behavioral Risk Factor Surveillance System (BRFSS) 2023 Dataset

Authors

  • Muhammad Dzaky Universitas Amikom Purwokerto
  • Adam Prayogo Kuncoro Universitas Amikom Purwokerto
  • Riyanto Riyanto Universitas Amikom Purwokerto

DOI:

https://doi.org/10.30871/jaic.v10i1.11897

Keywords:

Heart Disease, XGBoost, Optuna, Random Search, BRFSS 2023

Abstract

Heart disease is a critical public health issue in Indonesia, contributing to approximately 1,5 million deaths annually. Although machine learning methods, particularly Extreme Gradient Boosting (XGBoost), have demonstrated strong performance in medical classification tasks, their optimization on large-scale and highly imbalanced health datasets remains underexplored. This study optimizes XGBoost for heart disease risk classification using the Behavioral Risk Factor Surveillance System (BRFSS) 2023 dataset, consisting of 290.156 samples after preprocessing. Two hyperparameter optimization approaches, Optuna and Random Search, are evaluated across three class imbalance handling techniques, namely class weighting, SMOTE, and Random Undersampling (RUS). Model evaluation focuses on AUC and recall to prioritize sensitivity in identifying individuals at risk. The results show that the OptunaRUS and RandomWeight models achieve the most stable performance, with OptunaRUS attaining an AUC of 83,06% and a recall of 75,69% on the test dataset. Feature importance analysis indicates that age range and hypertension are the most influential predictors. These findings confirm that hyperparameter optimization on large-scale health data improves model discriminative capability and generalization, while selective sampling strategies such as RUS provide more stable performance than generative methods in high-dimensional datasets.

Downloads

Download data is not yet available.

References

[1] H. Hidayat, A. Sunyoto, and H. Al Fatta, “Klasifikasi Penyakit Jantung Menggunakan Random Forest Clasifier,” J. SISKOM-KB (Sistem Komput. dan Kecerdasan Buatan), vol. 7, no. 1, pp. 31–40, 2023, doi: 10.47970/siskom-kb.v7i1.464.

[2] H. Sawitri, N. Maulina, T. Y. Lutfi, and N. Rahmi, “Tingkat Risiko Penyakit Jantung dan Pembuluh Darah pada Dosen dan Karyawan,” J. Ilm. Mns. Dan Kesehat., vol. 6, no. 1, pp. 37–43, 2023, doi: 10.31850/makes.v6i1.1914.

[3] Kemenkes, Profil Kesehatan Indonesia 2023. 2024. [Online]. Available: https://kemkes.go.id/id/profil-kesehatan-indonesia-2023

[4] Kemenkes, Survei Kesehatan Indonesia (SKI) 2023. 2023. [Online]. Available: https://www.badankebijakan.kemkes.go.id/ski-2023-dalam-angka/

[5] D. G. Pradana, M. L. Alghifari, M. F. Juna, and S. D. Palaguna, “Klasifikasi Penyakit Jantung Menggunakan Metode Artificial Neural Network,” Indones. J. Data Sci., vol. 3, no. 2, pp. 55–60, 2022, doi: 10.56705/ijodas.v3i2.35.

[6] N. Nuraeni, “Klasifikasi Data Mining Untuk Prediksi Penyakit Kardiovaskular,” J. TEKINKOM (Teknik Inf. dan Komputer), vol. 7, no. 1, pp. 161–170, 2024, doi: 10.37600/tekinkom.v7i1.1276.

[7] S. Munawaroh, U. A. Rosyidah, and R. Yanuarti, “Klasifikasi Tingkat Kecemasan Atlet Sebelum Bertanding Menggunakan Algoritma K–Nearest Neighbor (KNN) Berbasis Website,” BIOS J. Teknol. Inf. dan Rekayasa Komput., vol. 5, no. 2, pp. 87–94, 2024, doi: 10.37148/bios.v5i2.120.

[8] A. U. Dullah, A. Y. Darmawan, D. A. A. Pertiwi, and J. Unjung, “Extreme Gradient Boosting Model with SMOTE for Heart Disease Classification,” JISKA (Jurnal Inform. Sunan Kalijaga), vol. 10, no. 1, pp. 48–62, 2025, doi: 10.14421/jiska.2025.10.1.48-62.

[9] G. Velarde et al., “Tree boosting methods for balanced and imbalanced classification and their robustness over time in risk assessment,” Intell. Syst. with Appl., vol. 22, 2024, doi: 10.1016/j.iswa.2024.200354.

[10] A. A. Yaqin, M. A. Barata, and N. Mahmudah, “Implementation of the Random Forest Algorithm with Optuna Optimization in Lung Cancer Classification,” Sist. J. Sist. Inf., vol. 14, no. 2, pp. 561–569, 2025, doi: 10.32520/stmsi.v14i2.4877.

[11] B. F. Sitanggang and P. Sitompul, “Deteksi Awal Kelangsungan Hidup Pasien Gagal Jantung Menggunakan Machine Learning Metode Random Forest,” Innov. J. Soc. Sci. Res., vol. 4, no. 2, pp. 3347–3357, 2024, doi: 10.31004/innovative.v4i2.8189.

[12] Firmansyah and A. Yulianto, “Prediksi Penyakit Jantung Menggunakan Algoritma Random Forest,” J. Minfo Polgan (Jurnal Penelit. Manaj. Inform., vol. 12, no. 2, pp. 2239–2246, 2023, doi: 10.33395/jmp.v12i2.13214.

[13] G. Almuzadid and E. R. Subhiyakto, “Stroke Risk Classification Using the Ensemble Learning Method of XGBoost and Random Forest,” J. Appl. Informatics Comput., vol. 9, no. 3, pp. 828–837, 2025, doi: 10.30871/jaic.v9i3.9528.

[14] D. Ariyanto, D. C. R. Novitasari, and A. Hamid, “Heart Disease Classification Using Extreme Learning Machine (ELM) Method With Outlier Handling One-Class Support Vector Machine (OCSVM),” J. Appl. Informatics Comput., vol. 9, no. 5, pp. 2143–2153, 2025, doi: 10.30871/jaic.v9i5.9815.

[15] V. R. Maulani, M. A. Barata, and P. E. Yuwita, “Improving House Price Clustering Results with K-means through the Implementation of One-hot Encoding Pre-processing Technique,” J. Appl. Informatics Comput., vol. 9, no. 3, pp. 741–748, 2025, doi: 10.30871/jaic.v9i3.9481.

[16] A. Digdoyo, A. S. B. Karno, W. Hastomo, E. Sestri, and R. Fitriansyah, “Prediksi Cacat Lempeng Baja Menggunakan Algoritma Bagging: Pendekatan Pembelajaran Mesin untuk Peningkatan Kualitas Produksi,” J. Ilm. KOMPUTASI, vol. 24, pp. 87–94, 2025, doi: 10.32409/jikstik.24.1.3654.

[17] E. Setiawan, B. Sartono, and K. A. Notodiputro, “SMOTE and Weighted Random Forest for Classification of Areas Based on Health Problems in Java,” J. Appl. Informatics Comput., vol. 9, no. 4, pp. 1587–1592, 2025, doi: 10.30871/jaic.v9i4.9933.

[18] M. F. Kurniawan and D. A. Megawaty, “Comparison of Logistic Regression, Random Forest, Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) Algorithms in Diabetes Prediction,” J. Appl. Informatics Comput., vol. 9, no. 5, pp. 2154–2162, 2025, doi: https://doi.org/10.30871/jaic.v9i5.9815.

[19] G. A. P. Febriyanti and A. Baita, “Comparison of Support Vector Machine and Decision Tree Algorithm Performance with Undersampling Approach in Predicting Heart Disease Based on Lifestyle,” J. Appl. Informatics Comput., vol. 9, no. 2, pp. 318–327, 2025, doi: 10.30871/jaic.v9i2.8941.

[20] D. Kurnia, M. I. Mazdadi, D. Kartini, R. A. Nugroho, and F. Abadi, “Seleksi Fitur dengan Particle Swarm Optimization pada Klasifikasi Penyakit Parkinson Menggunakan XGBoost,” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 5, pp. 1083–1094, 2023, doi: 10.25126/jtiik.2023107252.

[21] Nurbaeti, N. Sulistiyaningsih, and R. Rismayanti, “Comparison of Random Forest, Decision Tree, and XGBoost Models in Predicting Student Academic Success,” J. Artif. Intell. Softw. Eng., vol. 5, no. 3, pp. 920–930, 2025, doi: 10.30811/jaise.v5i3.7138.

[22] P. Zhang, Y. Jia, and Y. Shang, “Research and application of XGBoost in imbalanced data,” Int. J. Distrib. Sens. Networks, vol. 18, no. 263, 2022, doi: 10.1177/15501329221106935.

[23] R. Andika and Kusrini, “Optimasi Hyperparameter Model LSTM dan Variannya untuk Peramalan Pembelian Bahan Baku Karet Alam,” JIPI (Jurnal Ilm. Penelit. dan Pembelajaran Inform., vol. 10, no. 3, pp. 2627–2639, 2025, doi: 10.29100/jipi.v10i3.7567.

[24] B. Siswoyo et al., “Optimization of Multi-Layer Perceptron in Ensemble Using Random Search for Bankruptcy Prediction,” J. Comput. Sci., vol. 19, no. 2, pp. 251–260, 2023, doi: 10.3844/jcssp.2023.251.260.

[25] F. D. Marleny, M. Fitriansyah, Sa’adah, W. A. N. Saputri, R. Ansari, and Mambang, “Segmentasi Citra Keretakan Dinding Beton Menggunakan Teknik Perbandingan Evaluasi Metrik,” Temat. J. Teknol. Inf. Komun., vol. 10, no. 2, pp. 28–33, 2023, doi: 10.38204/tematik.v10i1.1261.

[26] L. Hakim, A. Sobri, L. Sunardi, and D. Nurdiansyah, “Prediksi Penyakit Jantung Berbasis Mesin Learning Dengan Menggunakan Metode K-NN,” J. Digit. Teknol. Inf., vol. 07, no. 02, pp. 14–20, 2024, doi: 10.32502/digital.v7i2.9429.

[27] R. Harahap, M. Irpan, M. A. Dinata, L. Efrizoni, and Rahmaddeni, “Perbandingan Algoritma Random Forest dan XGBoost untuk Klasifikasi Penyakit Paru-Paru Berdasarkan Data Demografi Pasien,” J. Ilm. Betrik, vol. 15, no. 02, pp. 130–141, 2024, doi: 10.36050/3v3xwn06.

[28] Kristiawan and A. Widjaja, “Perbandingan Algoritma Machine Learning dalam Menilai Sebuah Lokasi Toko Ritel,” J. Tek. Inform. dan Sist. Inf., vol. 7, no. 1, pp. 35–46, 2021, doi: 10.28932/jutisi.v7i1.3182.

[29] Amrin, Rudianto, and Sismadi, “Data Mining with Logistic Regression and Support Vector Machine for Hepatitis Disease Diagnosis,” JITE (Journal Informatics Telecommun. Eng., vol. 8, no. 2, pp. 248–256, 2025, doi: 10.31289/jite.v8i2.13218.

Downloads

Published

2026-02-11

How to Cite

[1]
M. Dzaky, A. P. Kuncoro, and R. Riyanto, “Optimizing XGBoost for Heart Disease Risk Classification Using Optuna and Random Search on the Behavioral Risk Factor Surveillance System (BRFSS) 2023 Dataset”, JAIC, vol. 10, no. 1, pp. 1015–1029, Feb. 2026.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.