SMOTE and Weighted Random Forest for Classification of Areas Based on Health Problems in Java

Authors

  • Erwan Setiawan Mathematics Education, Suryakancana University
  • Bagus Sartono School of Data Science, Mathematics, and Informatics IPB University
  • Khairil Anwar Notodiputro School of Data Science, Mathematics, and Informatics IPB University

DOI:

https://doi.org/10.30871/jaic.v9i4.9933

Keywords:

Random Forest, SMOTE, Weighted Random Forest

Abstract

Random Forest (RF) is a popular Machine Learning (ML) approach extensively employed for addressing classification issues. Nevertheless, the RF method for classification problems demonstrates suboptimal performance in cases of data imbalance. There are several approaches to enhance RF performance when coping with data imbalance issues, such as using weighting and oversampling.  This research explores the intervention of RF in addressing data imbalances, focusing on case studies of health problem classification in Java This study aims to develop models to analyze the health status of regions using RF, WRF, SMOTE-RF, and SMOTE-WRF methods. The objective is to compare the performance of these models and identify the best model for classifying DBK and Non-DBK categories in Java. The research results show that SMOTE-WRF is the most effective model in classifying DBK, achieving an accuracy level of 93.62%, sensitivity of 85.71%, precision of 75%, F-score of 80%, and AUC of 93.57%. The three key variables in the SMOTE-WRF model entail access to adequate sanitation, egg and milk consumption, and the number of doctors

Downloads

Download data is not yet available.

References

[1] H. R. Yarah, “Perbandingan Random Forest Dan SMOTE Random Forest Pada Klasifikasi Berat Badan Lahir Rendah (BBLR),” IPB University, 2023.

[2] S. Devella, Y. Yohannes, and F. N. Rahmawati, “Implementasi Random Forest Untuk Klasifikasi Motif Songket Palembang Berdasarkan SIFT,” J. Tek. Inform. dan Sist. Inf., vol. 7, no. 2, pp. 310–320, 2020.

[3] O. Pahlevi, A. Amrin, and Y. Handrianto, “Implementasi Algoritma Klasifikasi Random Forest Untuk Penilaian Kelayakan Kredit,” J. Infortech, vol. 5, no. 1, 2023, doi: https://doi.org/10.31294/infortech.v5i1.15829.

[4] T. Posangi, L. Yahya, and D. Wungguli, “Implementasi Algoritma Random Forest Dengan Forward Selection Untuk Klasifikasi Indeks Pembangunan Manusia,” JAMBURA J. Probab. Stat., vol. 4, no. 2, 2023, doi: https://doi.org/10.37905/jjps.v4i2.18460.

[5] C. Chen, A. Liaw, and L. Breiman, “Using Random Forest to Learn Imbalanced Data,” 2004, [Online]. Available: https://api.semanticscholar.org/CorpusID:7308660

[6] L. Budianti and Suliadi, “Metode Weighted Random Forest Dalam Klasifikasi Prediksi Kelangsungan Hidup Pasien Gagal Jantung,” in Bandung Conference Series: Statistics, 2022, pp. 103–110.

[7] L. Breiman, “Random Forest,” Mach. Learn., vol. 45, pp. 5–32, 2001.

[8] B. Krawczyk, “Learning from imbalanced data: open challenges and future directions,” Prog. Artif. Intell., vol. 5, pp. 221–232, 2016, doi: 10.1007/s13748-016-0094-0.

[9] J. Prasetya and Abdurakhman, “Comparison Of SMOTE Random Forest And SMOTE K-Nearest Neighbors Classification Analysis On Imbalanced Data,” Media Stat., vol. 15, no. 2, pp. 198–208, 2022.

[10] C. M. Lauw, J. X. Guterres, M. M. Huda, I. Saifudin, H. Hairani, and M. Mayadi, “Combination of SMOTE and Random Forest Methods for Lung Cancer Classification,” Int. J. Eng. Comput. Sci. Appl., vol. 2, no. 2, pp. 59–64, 2023, doi: 10.30812/IJECSA.v2i2.3333.

[11] A. Tesfahun and D. L. Bhaskari, “Intrusion Detection using Random Forests Classifier with SMOTE and Feature Reduction,” 2013.

[12] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res., vol. 16, 2002, doi: https://doi.org/10.1613/jair.953.

[13] Kementerian Kesehatan RI, Buku Saku Penanggulangan Daerah Bermasalah Kesehatan. Jakarta: Badan Penelitian dan Pengembangan Kesehatan, 2011. [Online]. Available: https://repository.badankebijakan.kemkes.go.id/id/eprint/3011/

[14] D. H. Tjandrarini, I. Dharmayanti, S. Suparmi, and O. Nainggolan, Indeks Pembangunan Kesehatan Masyarakat 2018. Jakarta: Lembaga Penerbit Badan Penelitian dan Pengembangan Kesehatan (LPB), 2019.

[15] C. D. Sutton, “Classification and Regression Trees, Bagging, and Boosting,” Handb. Stat., vol. 24, pp. 303–329, 2005, doi: https://doi.org/10.1016/S0169-7161(04)24011-1.

[16] L. Breiman and A. Cutler, “Manual--Setting Up, Using, And Understanding Random Forests V4.0,” 2003. [Online]. Available: https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf

[17] M. Kuhn and K. Johnson, Applied Predictive Modeling. New York: Springer, 2016.

[18] H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, 2009, doi: 10.1109/TKDE.2008.239.

[19] N. Japkowicz and S. Stephen, “The class imbalance problem: A systematic study,” Intell. Data Anal. An Int. J., vol. 6, no. 5, 2002, doi: 10.3233/IDA-2002-6504.

[20] A. Fernandez, S. Garcia, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera, Learning from Imbalanced Data Sets. Cham, Switzerland: Springer, 2018.

[21] World Health Organization (WHO), “Primary health care on the road to universal health coverage: 2019 monitoring report,” 2019. [Online]. Available: https://www.who.int/publications/i/item/9789240029040

[22] C. G. Victora, A. Wagstaff, J. A. Schellenberg, D. Gwatkin, M. Claeson, and J.-P. Habicht, “Applying an equity lens to child health and mortality: more of the same is not enough,” Lancet, vol. 362, no. 9379, pp. 233–241, 2003, doi: 10.1016/S0140-6736(03)13917-7.

[23] A. Prüss-Ustün et al., “Burden of disease from inadequate water, sanitation and hygiene for selected adverse health outcomes: An updated analysis with a focus on lowand middle-income countries,” Int. J. Hyg. Environ. Health, vol. 222, no. 5, pp. 765–777, 2019, doi: 10.1016/j.ijheh.2019.05.004.

Downloads

Published

2025-08-06

How to Cite

[1]
E. Setiawan, B. Sartono, and K. A. Notodiputro, “SMOTE and Weighted Random Forest for Classification of Areas Based on Health Problems in Java”, JAIC, vol. 9, no. 4, pp. 1587–1592, Aug. 2025.

Issue

Section

Articles

Most read articles by the same author(s)

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.