Optimizing XGBoost for Heart Disease Risk Classification Using Optuna and Random Search on the Behavioral Risk Factor Surveillance System (BRFSS) 2023 Dataset
DOI:
https://doi.org/10.30871/jaic.v10i1.11897Keywords:
Heart Disease, XGBoost, Optuna, Random Search, BRFSS 2023Abstract
Heart disease is a critical public health issue in Indonesia, contributing to approximately 1,5 million deaths annually. Although machine learning methods, particularly Extreme Gradient Boosting (XGBoost), have demonstrated strong performance in medical classification tasks, their optimization on large-scale and highly imbalanced health datasets remains underexplored. This study optimizes XGBoost for heart disease risk classification using the Behavioral Risk Factor Surveillance System (BRFSS) 2023 dataset, consisting of 290.156 samples after preprocessing. Two hyperparameter optimization approaches, Optuna and Random Search, are evaluated across three class imbalance handling techniques, namely class weighting, SMOTE, and Random Undersampling (RUS). Model evaluation focuses on AUC and recall to prioritize sensitivity in identifying individuals at risk. The results show that the OptunaRUS and RandomWeight models achieve the most stable performance, with OptunaRUS attaining an AUC of 83,06% and a recall of 75,69% on the test dataset. Feature importance analysis indicates that age range and hypertension are the most influential predictors. These findings confirm that hyperparameter optimization on large-scale health data improves model discriminative capability and generalization, while selective sampling strategies such as RUS provide more stable performance than generative methods in high-dimensional datasets.
Downloads
References
[1] H. Hidayat, A. Sunyoto, and H. Al Fatta, “Klasifikasi Penyakit Jantung Menggunakan Random Forest Clasifier,” J. SISKOM-KB (Sistem Komput. dan Kecerdasan Buatan), vol. 7, no. 1, pp. 31–40, 2023, doi: 10.47970/siskom-kb.v7i1.464.
[2] H. Sawitri, N. Maulina, T. Y. Lutfi, and N. Rahmi, “Tingkat Risiko Penyakit Jantung dan Pembuluh Darah pada Dosen dan Karyawan,” J. Ilm. Mns. Dan Kesehat., vol. 6, no. 1, pp. 37–43, 2023, doi: 10.31850/makes.v6i1.1914.
[3] Kemenkes, Profil Kesehatan Indonesia 2023. 2024. [Online]. Available: https://kemkes.go.id/id/profil-kesehatan-indonesia-2023
[4] Kemenkes, Survei Kesehatan Indonesia (SKI) 2023. 2023. [Online]. Available: https://www.badankebijakan.kemkes.go.id/ski-2023-dalam-angka/
[5] D. G. Pradana, M. L. Alghifari, M. F. Juna, and S. D. Palaguna, “Klasifikasi Penyakit Jantung Menggunakan Metode Artificial Neural Network,” Indones. J. Data Sci., vol. 3, no. 2, pp. 55–60, 2022, doi: 10.56705/ijodas.v3i2.35.
[6] N. Nuraeni, “Klasifikasi Data Mining Untuk Prediksi Penyakit Kardiovaskular,” J. TEKINKOM (Teknik Inf. dan Komputer), vol. 7, no. 1, pp. 161–170, 2024, doi: 10.37600/tekinkom.v7i1.1276.
[7] S. Munawaroh, U. A. Rosyidah, and R. Yanuarti, “Klasifikasi Tingkat Kecemasan Atlet Sebelum Bertanding Menggunakan Algoritma K–Nearest Neighbor (KNN) Berbasis Website,” BIOS J. Teknol. Inf. dan Rekayasa Komput., vol. 5, no. 2, pp. 87–94, 2024, doi: 10.37148/bios.v5i2.120.
[8] A. U. Dullah, A. Y. Darmawan, D. A. A. Pertiwi, and J. Unjung, “Extreme Gradient Boosting Model with SMOTE for Heart Disease Classification,” JISKA (Jurnal Inform. Sunan Kalijaga), vol. 10, no. 1, pp. 48–62, 2025, doi: 10.14421/jiska.2025.10.1.48-62.
[9] G. Velarde et al., “Tree boosting methods for balanced and imbalanced classification and their robustness over time in risk assessment,” Intell. Syst. with Appl., vol. 22, 2024, doi: 10.1016/j.iswa.2024.200354.
[10] A. A. Yaqin, M. A. Barata, and N. Mahmudah, “Implementation of the Random Forest Algorithm with Optuna Optimization in Lung Cancer Classification,” Sist. J. Sist. Inf., vol. 14, no. 2, pp. 561–569, 2025, doi: 10.32520/stmsi.v14i2.4877.
[11] B. F. Sitanggang and P. Sitompul, “Deteksi Awal Kelangsungan Hidup Pasien Gagal Jantung Menggunakan Machine Learning Metode Random Forest,” Innov. J. Soc. Sci. Res., vol. 4, no. 2, pp. 3347–3357, 2024, doi: 10.31004/innovative.v4i2.8189.
[12] Firmansyah and A. Yulianto, “Prediksi Penyakit Jantung Menggunakan Algoritma Random Forest,” J. Minfo Polgan (Jurnal Penelit. Manaj. Inform., vol. 12, no. 2, pp. 2239–2246, 2023, doi: 10.33395/jmp.v12i2.13214.
[13] G. Almuzadid and E. R. Subhiyakto, “Stroke Risk Classification Using the Ensemble Learning Method of XGBoost and Random Forest,” J. Appl. Informatics Comput., vol. 9, no. 3, pp. 828–837, 2025, doi: 10.30871/jaic.v9i3.9528.
[14] D. Ariyanto, D. C. R. Novitasari, and A. Hamid, “Heart Disease Classification Using Extreme Learning Machine (ELM) Method With Outlier Handling One-Class Support Vector Machine (OCSVM),” J. Appl. Informatics Comput., vol. 9, no. 5, pp. 2143–2153, 2025, doi: 10.30871/jaic.v9i5.9815.
[15] V. R. Maulani, M. A. Barata, and P. E. Yuwita, “Improving House Price Clustering Results with K-means through the Implementation of One-hot Encoding Pre-processing Technique,” J. Appl. Informatics Comput., vol. 9, no. 3, pp. 741–748, 2025, doi: 10.30871/jaic.v9i3.9481.
[16] A. Digdoyo, A. S. B. Karno, W. Hastomo, E. Sestri, and R. Fitriansyah, “Prediksi Cacat Lempeng Baja Menggunakan Algoritma Bagging: Pendekatan Pembelajaran Mesin untuk Peningkatan Kualitas Produksi,” J. Ilm. KOMPUTASI, vol. 24, pp. 87–94, 2025, doi: 10.32409/jikstik.24.1.3654.
[17] E. Setiawan, B. Sartono, and K. A. Notodiputro, “SMOTE and Weighted Random Forest for Classification of Areas Based on Health Problems in Java,” J. Appl. Informatics Comput., vol. 9, no. 4, pp. 1587–1592, 2025, doi: 10.30871/jaic.v9i4.9933.
[18] M. F. Kurniawan and D. A. Megawaty, “Comparison of Logistic Regression, Random Forest, Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) Algorithms in Diabetes Prediction,” J. Appl. Informatics Comput., vol. 9, no. 5, pp. 2154–2162, 2025, doi: https://doi.org/10.30871/jaic.v9i5.9815.
[19] G. A. P. Febriyanti and A. Baita, “Comparison of Support Vector Machine and Decision Tree Algorithm Performance with Undersampling Approach in Predicting Heart Disease Based on Lifestyle,” J. Appl. Informatics Comput., vol. 9, no. 2, pp. 318–327, 2025, doi: 10.30871/jaic.v9i2.8941.
[20] D. Kurnia, M. I. Mazdadi, D. Kartini, R. A. Nugroho, and F. Abadi, “Seleksi Fitur dengan Particle Swarm Optimization pada Klasifikasi Penyakit Parkinson Menggunakan XGBoost,” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 5, pp. 1083–1094, 2023, doi: 10.25126/jtiik.2023107252.
[21] Nurbaeti, N. Sulistiyaningsih, and R. Rismayanti, “Comparison of Random Forest, Decision Tree, and XGBoost Models in Predicting Student Academic Success,” J. Artif. Intell. Softw. Eng., vol. 5, no. 3, pp. 920–930, 2025, doi: 10.30811/jaise.v5i3.7138.
[22] P. Zhang, Y. Jia, and Y. Shang, “Research and application of XGBoost in imbalanced data,” Int. J. Distrib. Sens. Networks, vol. 18, no. 263, 2022, doi: 10.1177/15501329221106935.
[23] R. Andika and Kusrini, “Optimasi Hyperparameter Model LSTM dan Variannya untuk Peramalan Pembelian Bahan Baku Karet Alam,” JIPI (Jurnal Ilm. Penelit. dan Pembelajaran Inform., vol. 10, no. 3, pp. 2627–2639, 2025, doi: 10.29100/jipi.v10i3.7567.
[24] B. Siswoyo et al., “Optimization of Multi-Layer Perceptron in Ensemble Using Random Search for Bankruptcy Prediction,” J. Comput. Sci., vol. 19, no. 2, pp. 251–260, 2023, doi: 10.3844/jcssp.2023.251.260.
[25] F. D. Marleny, M. Fitriansyah, Sa’adah, W. A. N. Saputri, R. Ansari, and Mambang, “Segmentasi Citra Keretakan Dinding Beton Menggunakan Teknik Perbandingan Evaluasi Metrik,” Temat. J. Teknol. Inf. Komun., vol. 10, no. 2, pp. 28–33, 2023, doi: 10.38204/tematik.v10i1.1261.
[26] L. Hakim, A. Sobri, L. Sunardi, and D. Nurdiansyah, “Prediksi Penyakit Jantung Berbasis Mesin Learning Dengan Menggunakan Metode K-NN,” J. Digit. Teknol. Inf., vol. 07, no. 02, pp. 14–20, 2024, doi: 10.32502/digital.v7i2.9429.
[27] R. Harahap, M. Irpan, M. A. Dinata, L. Efrizoni, and Rahmaddeni, “Perbandingan Algoritma Random Forest dan XGBoost untuk Klasifikasi Penyakit Paru-Paru Berdasarkan Data Demografi Pasien,” J. Ilm. Betrik, vol. 15, no. 02, pp. 130–141, 2024, doi: 10.36050/3v3xwn06.
[28] Kristiawan and A. Widjaja, “Perbandingan Algoritma Machine Learning dalam Menilai Sebuah Lokasi Toko Ritel,” J. Tek. Inform. dan Sist. Inf., vol. 7, no. 1, pp. 35–46, 2021, doi: 10.28932/jutisi.v7i1.3182.
[29] Amrin, Rudianto, and Sismadi, “Data Mining with Logistic Regression and Support Vector Machine for Hepatitis Disease Diagnosis,” JITE (Journal Informatics Telecommun. Eng., vol. 8, no. 2, pp. 248–256, 2025, doi: 10.31289/jite.v8i2.13218.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Muhammad Dzaky, Adam Prayogo Kuncoro, Riyanto Riyanto

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








