Optimization of Early Diagnosis Prediction Models for Acute Respiratory Infections (ARI) in Children Using Decision Tree, Random Forest, and Resampling Techniques

Authors

  • Caesario Gumilang Firdaus Universitas Dian Nuswantoro
  • Asih Rohmani Universitas Dian Nuswantoro
  • Suharnawi Suharnawi Universitas Dian Nuswantoro

DOI:

https://doi.org/10.30871/jaic.v9i6.11558

Keywords:

Decision Tree, Data Imbalance, Pediatric ARI, Random Forest, SMOTE-ENN

Abstract

Acute Respiratory Tract Infections (ARI) are the leading cause of childhood morbidity in Indonesia, with challenges in early detection due to limited medical personnel and diagnostic data imbalance, where LRTI cases are far fewer than URTI cases. This study developed and optimized an ARI classification prediction model (URTI and LRTI) based on machine learning with resampling techniques to address imbalance. An explanatory quantitative design was used with secondary data from the Mijen Community Health Center, Semarang (2020–2025, 12.177 valid data), with preprocessing including outlier handling (Winsorizing, IQR), stratified split (70:30), and RobustScaler on the training data. Three resampling techniques (SMOTE, ADASYN, SMOTE-ENN) were applied, then tested using Decision Tree and Random Forest with GridSearchCV and 5-fold cross-validation, focusing on Recall and AUC-PR evaluation for minority classes. The results showed that Random Forest with SMOTE-ENN provided the best performance, increasing the LRTI recall from 0.02 to 0.37 and F1-macro to 0.54, while Decision Tree with SMOTE-ENN produced the highest AUC-PR of 0.31. Despite this significant improvement, a recall of 0.37 is still low for clinical applications because the risk of false negatives remains high, potentially delaying patient treatment Future implementation requires the integration of clinical symptom data (e.g., respiratory rate) to achieve clinically acceptable sensitivity. These findings confirm that resampling can improve model capabilities, but additional feature exploration is needed to achieve adequate diagnostic sensitivity in the context of healthcare analytics.

Downloads

Download data is not yet available.

References

[1] V. History, “Describing, characterising and predicting winter respiratory accident and emergency attendances, hospital and intensive care unit admissions and deaths in Scotland Version History,” hal. 1–8, 2023.

[2] M. Del Riccio et al., “Burden of Respiratory Syncytial Virus in the European Union: Estimation of RSV-Associated hospitalizations in children under 5 years,” J. Infect. Dis., vol. 228, no. 11, hal. 1528–1538, 2023, doi: 10.1093/infdis/jiad188.

[3] Survey kesehatan indonesia (Ski), “Survei Kesehatan Indonesia 2023 (SKI),” Kemenkes, hal. 235, 2023.

[4] R. Boracchini et al., “A silent strain: the unseen burden of acute respiratory infections in children,” Ital. J. Pediatr., vol. 50, no. 1, hal. 2–5, 2024, doi: 10.1186/s13052-024-01754-2.

[5] S. Azis, H. Jusuf, dan L. Kadir, “Risiko Kejadian Penyakit Infeksi Saluran Pernapasan Akut pada Balita di Puskesmas Momunu Kabupaten Buol,” Myjurnal.Poltekkes-Kdi.Ac.Id, vol. 14, no. 2, hal. 2087–2122, 2023.

[6] M. T. Hidayat, E. A. Jayadipraja, M. Asrullah, dan K. W. Astawa, “Analisis Time Trend Kualitas Udara Ambien dan Peningkatan ISPA di Kota Kendari,” Miracle J. Public Heal., vol. 7, no. 1, hal. 53–65, 2024, doi: 10.36566/mjph/Vol7.Iss1/365.

[7] K. P. A. Nugroho, B. P. S. Adi, dan R. Angelina, “Gambaran Status Gizi Kurang Dan Kejadian Penyakit Ispa Pada Balita Di Desa Batur, Kecamatan Getasan, Kabupaten Semarang,” J. Kesehat. Kusuma Husada, hal. 233–242, 2018, doi: 10.34035/jk.v9i2.285.

[8] A. Information, “Evaluasi Dan Kontrol Kualitas Kelengkapan Berkas,” vol. 2, hal. 627–634, 2024.

[9] K. Ritonga dan K. Kunci, “Hubungan Faktor Risiko Dengan Kejadian Ispa Pada Anak Di Wilayah Kerja Puskesmas Tanjung Beringin Kabupaten Serdang,” vol. IV, no. Ii, hal. 108–114, 2021.

[10] S. Billa, N. Suhada, C. Novianus, dan I. R. Wilti, “Faktor-Faktor yang Berhubungan dengan Kejadian Ispa pada Balita di Puskesmas Cikuya Kabupaten Tangerang Tahun 2022,” vol. 3, no. 2, hal. 115–124, 2023.

[11] E. R. Molenaar, F. Hans, dan M. Mawo, “Hubungan Status Gizi Dengan Kejadian Ispa Pada Balita Di Klinik Julia Likupang,” vol. 6, hal. 6823–6830, 2025.

[12] I. D. Lubis, K. A. Khalil, R. Nurmalinda, dan N. I. A, “Artikel Pengabdian Masyarakat Edukasi Memahami Secara Umum Penyakit Hipertensi Dan Penyakit Infeksi Saluran Pernafasan Atas,” vol. 6, no. 3, hal. 61–67, 2025.

[13] J. Ilmiah dan W. Pendidikan, “Gambaran Faktor-Faktor Yang Memengaruhi Kejadian Infeksi Saluran Pernapasan Akut (ISPA) Pada Balita Di Wilayah Kerja Puskesmas Sokaraja I Hima,” vol. 11, no. September, hal. 51–58, 2025.

[14] A. I. Harahap, R. D. Priyatna, H. P. Figna, dan N. Rambe, “Aplikasi Cerdas Terintegrasi dalam Mendiagnosa Penyakit ISPA Pneumonia Pada Balita Menggunakan Algoritma Neural Network Backprogation di Kabupaten Langkat,” G-Tech J. Teknol. Terap., vol. 7, no. 4, hal. 1703–1712, 2023, doi: 10.33379/gtech.v7i4.3343.

[15] N. Mauliza, A. S. Iedwan, Y. Pristyanto, A. D. Hartanto, dan A. N. Rohman, “The Effect of Resampling Techniques on Model Performance Classification of Maternal Health Risks,” J. RESTI, vol. 8, no. 4, hal. 496–505, 2024, doi: 10.29207/resti.v8i4.5934.

[16] A. Mukherjee et al., “SMOTE-ENN resampling technique with Bayesian optimization for multi-class classification of dry bean varieties,” Appl. Soft Comput., vol. 181, no. June, hal. 113467, 2025, doi: 10.1016/j.asoc.2025.113467.

[17] R. Arisandi, “Perbandingan Model Klasifikasi Random Forest Dengan Resampling Dan Tanpa Resampling Pada Pasien Penderita Gagal Jantung,” J. Gaussian, vol. 12, no. 1, hal. 136–145, 2023, doi: 10.14710/j.gauss.12.1.136-145.

[18] A. Zolanda, M. Raharjo, dan O. Setiani, “Faktor Risiko Kejadian Infeksi Saluran Pernafasan Akut Pada Balita Di Indonesia,” Link, vol. 17, no. 1, hal. 73–80, 2021, doi: 10.31983/link.v17i1.6828.

[19] Z. B. Tadese et al., “Interpretable prediction of acute respiratory infection disease among under- five children in Ethiopia using ensemble machine learning and Shapley additive explanations ( SHAP ),” 2024, doi: 10.1177/20552076241272739.

[20] P. R. Sihombing, S. Suryadiningrat, D. A. Sunarjo, dan Y. P. A. C. Yuda, “Identifikasi Data Outlier (Pencilan) dan Kenormalan Data Pada Data Univariat serta Alternatif Penyelesaiannya,” J. Ekon. Dan Stat. Indones., vol. 2, no. 3, hal. 307–316, 2023, doi: 10.11594/jesi.02.03.07.

[21] F. Septian, “Optimasi Klusterisasi pada Lama Tempo Pekerjaan Berbasis Gradient Boost Algorithm,” IJITECH Indones. J. Inf. Technol., vol. 2, no. 1, hal. 1–5, 2024, doi: 10.71155/vpny7m62.

[22] R. F. Ramadhan dan W. M. Ashari, “Performance Comparison of Random Forest and Decision Tree Algorithms for Anomaly Detection in Networks,” J. Appl. Informatics Comput., vol. 8, no. 2, hal. 367–375, 2024, doi: 10.30871/jaic.v8i2.8492.

[23] Fashihullisan, Dodi Vionanda, Yenni Kurniawati, dan Fadhilah Fitri, “Comparing Classification and Regression Tree and Logistic Regression Algorithms Using 5×2cv Combined F-Test on Diabetes Mellitus Dataset,” UNP J. Stat. Data Sci., vol. 1, no. 4, hal. 344–352, 2023, doi: 10.24036/ujsds/vol1-iss4/84.

[24] E. Virantika, K. Kusnawi, dan J. Ipmawati, “Evaluasi Hasil Pengujian Tingkat Clusterisasi Penerapan Metode K-Means Dalam Menentukan Tingkat Penyebaran Covid-19 di Indonesia,” J. Media Inform. Budidarma, vol. 6, no. 3, hal. 1657, 2022, doi: 10.30865/mib.v6i3.4325.

[25] R. Ridwan, E. H. Hermaliani, dan M. Ernawati, “Penerapan: Penerapan Metode SMOTE Untuk Mengatasi Imbalanced Data Pada Klasifikasi Ujaran Kebencian,” Comput. Sci., vol. 4, no. 1, hal. 80–88, 2024, [Daring]. Tersedia pada: https://jurnal.bsi.ac.id/index.php/co-science/article/view/2990

[26] F. Gurcan dan A. Soylu, “Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis,” Cancers (Basel)., vol. 16, no. 19, 2024, doi: 10.3390/cancers16193417.

[27] A. H. Putra dan A. Salam, “A Comparative Performance of SMOTE, ADASYN and Random Oversampling in Machine Learning Models on Prostate Cancer Dataset,” J. Appl. Informatics Comput., vol. 9, no. 3, hal. 603–610, 2025, doi: 10.30871/jaic.v9i3.9308.

[28] Putri Ayu Firnanda, Litasya Shofwatillah, Fauziah Rahma, dan Fatkhurokhman Fauzi, “Analisis Perbandingan Decision Tree dan Random Forest dalam Klasifikasi Penjualan Produk pada Supermarket,” Emerg. Stat. Data Sci. J., vol. 3, no. 1, hal. 445–461, 2025, doi: 10.20885/esds.vol3.iss.1.art2.

[29] S. Riyanto, I. S. Sitanggang, T. Djatna, dan T. D. Atikah, “Comparative Analysis using Various Performance Metrics in Imbalanced Data for Multi-class Text Classification,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 6, hal. 1082–1090, 2023, doi: 10.14569/IJACSA.2023.01406116.

[30] E. R. Susanto, M. R. Inzaghi, A. Amarudin, dan N. Neneng, “Evaluasi Kinerja Model Random Forest Dalam Memprediksi Diabetes Berdasarkan Dataset Kesehatan di Indonesia,” J. Pendidik. dan Teknol. Indones., vol. 5, no. 7, hal. 1857–1866, 2025, doi: 10.52436/1.jpti.871.

[31] Rizky Fauzan, Anik Vega Vitianingsih, Dwi Cahyono, Anastasia Lidya Maukar, dan Yoyon Arie Budi Suprio, “Application of Classification Algorithms in Machine Learning for Phishing Detection - Penerapan Algoritma Klasifikasi pada Machine Learning untuk Deteksi Phishing,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 5, no. 2, hal. 531–540, 2025.

[32] A. R. Hanum et al., “Mendeteksi Berita Hoaks Performance Analysis of the Bert Text Classification Algorithm,” J. Teknol. dan Ilmu Komput., vol. 11, no. 3, hal. 537–546, 2024, doi: 10.25126/jtiik2024118093.

[33] T. H. Pinem dan Z. P. Putra, “Evaluasi Kinerja Algoritma Klasifikasi Deep Learning dalam Prediksi Diabetes,” J. Ilm. FIFO, vol. 17, no. 1, hal. 17, 2025, doi: 10.22441/fifo.2025.v17i1.003.

[34] J. Haviar Saviola dan N. Deny Hendrawan, “Implementasi Klasifikasi Kualitas Susu Menggunakan Algoritma Decision Tree, K-Nearest Neighbors Dan Naive Bayes,” JATI (Jurnal Mhs. Tek. Inform., vol. 9, no. 5, hal. 8953–8960, 2025, doi: 10.36040/jati.v9i5.15260.

[35] D. Chicco dan G. Jurman, “The advantages of the Matthews correlation coefficient ( MCC ) over F1 score and accuracy in binary classification evaluation,” hal. 1–13, 2020.

Downloads

Published

2025-12-07

How to Cite

[1]
C. G. Firdaus, A. Rohmani, and S. Suharnawi, “Optimization of Early Diagnosis Prediction Models for Acute Respiratory Infections (ARI) in Children Using Decision Tree, Random Forest, and Resampling Techniques”, JAIC, vol. 9, no. 6, pp. 3419–3430, Dec. 2025.

Similar Articles

<< < 1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.