SMOTE and Weighted Random Forest for Classification of Areas Based on Health Problems in Java
DOI:
https://doi.org/10.30871/jaic.v9i4.9933Keywords:
Random Forest, SMOTE, Weighted Random ForestAbstract
Random Forest (RF) is a popular Machine Learning (ML) approach extensively employed for addressing classification issues. Nevertheless, the RF method for classification problems demonstrates suboptimal performance in cases of data imbalance. There are several approaches to enhance RF performance when coping with data imbalance issues, such as using weighting and oversampling. This research explores the intervention of RF in addressing data imbalances, focusing on case studies of health problem classification in Java This study aims to develop models to analyze the health status of regions using RF, WRF, SMOTE-RF, and SMOTE-WRF methods. The objective is to compare the performance of these models and identify the best model for classifying DBK and Non-DBK categories in Java. The research results show that SMOTE-WRF is the most effective model in classifying DBK, achieving an accuracy level of 93.62%, sensitivity of 85.71%, precision of 75%, F-score of 80%, and AUC of 93.57%. The three key variables in the SMOTE-WRF model entail access to adequate sanitation, egg and milk consumption, and the number of doctors
Downloads
References
[1] H. R. Yarah, “Perbandingan Random Forest Dan SMOTE Random Forest Pada Klasifikasi Berat Badan Lahir Rendah (BBLR),” IPB University, 2023.
[2] S. Devella, Y. Yohannes, and F. N. Rahmawati, “Implementasi Random Forest Untuk Klasifikasi Motif Songket Palembang Berdasarkan SIFT,” J. Tek. Inform. dan Sist. Inf., vol. 7, no. 2, pp. 310–320, 2020.
[3] O. Pahlevi, A. Amrin, and Y. Handrianto, “Implementasi Algoritma Klasifikasi Random Forest Untuk Penilaian Kelayakan Kredit,” J. Infortech, vol. 5, no. 1, 2023, doi: https://doi.org/10.31294/infortech.v5i1.15829.
[4] T. Posangi, L. Yahya, and D. Wungguli, “Implementasi Algoritma Random Forest Dengan Forward Selection Untuk Klasifikasi Indeks Pembangunan Manusia,” JAMBURA J. Probab. Stat., vol. 4, no. 2, 2023, doi: https://doi.org/10.37905/jjps.v4i2.18460.
[5] C. Chen, A. Liaw, and L. Breiman, “Using Random Forest to Learn Imbalanced Data,” 2004, [Online]. Available: https://api.semanticscholar.org/CorpusID:7308660
[6] L. Budianti and Suliadi, “Metode Weighted Random Forest Dalam Klasifikasi Prediksi Kelangsungan Hidup Pasien Gagal Jantung,” in Bandung Conference Series: Statistics, 2022, pp. 103–110.
[7] L. Breiman, “Random Forest,” Mach. Learn., vol. 45, pp. 5–32, 2001.
[8] B. Krawczyk, “Learning from imbalanced data: open challenges and future directions,” Prog. Artif. Intell., vol. 5, pp. 221–232, 2016, doi: 10.1007/s13748-016-0094-0.
[9] J. Prasetya and Abdurakhman, “Comparison Of SMOTE Random Forest And SMOTE K-Nearest Neighbors Classification Analysis On Imbalanced Data,” Media Stat., vol. 15, no. 2, pp. 198–208, 2022.
[10] C. M. Lauw, J. X. Guterres, M. M. Huda, I. Saifudin, H. Hairani, and M. Mayadi, “Combination of SMOTE and Random Forest Methods for Lung Cancer Classification,” Int. J. Eng. Comput. Sci. Appl., vol. 2, no. 2, pp. 59–64, 2023, doi: 10.30812/IJECSA.v2i2.3333.
[11] A. Tesfahun and D. L. Bhaskari, “Intrusion Detection using Random Forests Classifier with SMOTE and Feature Reduction,” 2013.
[12] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res., vol. 16, 2002, doi: https://doi.org/10.1613/jair.953.
[13] Kementerian Kesehatan RI, Buku Saku Penanggulangan Daerah Bermasalah Kesehatan. Jakarta: Badan Penelitian dan Pengembangan Kesehatan, 2011. [Online]. Available: https://repository.badankebijakan.kemkes.go.id/id/eprint/3011/
[14] D. H. Tjandrarini, I. Dharmayanti, S. Suparmi, and O. Nainggolan, Indeks Pembangunan Kesehatan Masyarakat 2018. Jakarta: Lembaga Penerbit Badan Penelitian dan Pengembangan Kesehatan (LPB), 2019.
[15] C. D. Sutton, “Classification and Regression Trees, Bagging, and Boosting,” Handb. Stat., vol. 24, pp. 303–329, 2005, doi: https://doi.org/10.1016/S0169-7161(04)24011-1.
[16] L. Breiman and A. Cutler, “Manual--Setting Up, Using, And Understanding Random Forests V4.0,” 2003. [Online]. Available: https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf
[17] M. Kuhn and K. Johnson, Applied Predictive Modeling. New York: Springer, 2016.
[18] H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, 2009, doi: 10.1109/TKDE.2008.239.
[19] N. Japkowicz and S. Stephen, “The class imbalance problem: A systematic study,” Intell. Data Anal. An Int. J., vol. 6, no. 5, 2002, doi: 10.3233/IDA-2002-6504.
[20] A. Fernandez, S. Garcia, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera, Learning from Imbalanced Data Sets. Cham, Switzerland: Springer, 2018.
[21] World Health Organization (WHO), “Primary health care on the road to universal health coverage: 2019 monitoring report,” 2019. [Online]. Available: https://www.who.int/publications/i/item/9789240029040
[22] C. G. Victora, A. Wagstaff, J. A. Schellenberg, D. Gwatkin, M. Claeson, and J.-P. Habicht, “Applying an equity lens to child health and mortality: more of the same is not enough,” Lancet, vol. 362, no. 9379, pp. 233–241, 2003, doi: 10.1016/S0140-6736(03)13917-7.
[23] A. Prüss-Ustün et al., “Burden of disease from inadequate water, sanitation and hygiene for selected adverse health outcomes: An updated analysis with a focus on lowand middle-income countries,” Int. J. Hyg. Environ. Health, vol. 222, no. 5, pp. 765–777, 2019, doi: 10.1016/j.ijheh.2019.05.004.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Erwan Setiawan, Bagus Sartono, Khairil Anwar Notodiputro

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








