Comparison of KNN and Naïve Bayes Classification Algorithms for Predicting Stunting in Toddlers in Banjaran District

Authors

  • Rivaldy Fauzan Mu'taz Universitas Widyatama
  • Ai Rosita Universitas Widyatama

DOI:

https://doi.org/10.30871/jaic.v9i5.9703

Keywords:

Stunting, Machine Learning, Classification, Naïve Bayes, K-Nearest Neighbors (KNN), Data Pre-processing, SMOTE

Abstract

Stunting is a chronic nutritional problem that seriously impacts child growth and development. This study aims to compare the performance of the Naïve Bayes and K-Nearest Neighbors (KNN) algorithms in predicting stunting in toddlers in Banjaran District. The dataset consists of 12,000 toddler data points with three main features: age, gender, and height. The research employed a quantitative approach by applying machine learning algorithms. The SMOTE oversampling technique was applied only to the training data to avoid data leakage, and 5-fold cross-validation was used. A K-value of 3 was selected for the final KNN model based on validation curve analysis to prevent overfitting. The results show that KNN significantly outperformed Naïve Bayes across all evaluation metrics. The Naïve Bayes model yielded an accuracy of 67.50%, precision of 50.87%, recall of 61.38%, F1-score of 55.63%, specificity of 70.54%, and an AUC score of 75.71%. Meanwhile, the KNN (K=3) model achieved an accuracy of 99.11%, precision of 98.08%, recall of 99.25%, F1-score of 98.66%, specificity of 99.03%, and an AUC score of 99.65%. The performance difference between the two models was confirmed by McNemar's Test with a p-value < 0.05, indicating a statistically significant difference. The low performance of Naïve Bayes was attributed to the violation of the feature independence assumption, particularly the high correlation between age and height (r ≈ 0.87). In conclusion, KNN is the more appropriate algorithm for stunting prediction on this dataset. However, the limitation of features suggests the need for further research with additional variables and external validation before wider-scale implementation.

Downloads

Download data is not yet available.

References

[1] E. Alpaydin, Introduction to machine learning, 4th ed. MIT Press, 2020.

[2] C. M. Bishop and H. Bishop, Deep Learning: Foundations and Concepts, 1st ed. Springer, 2023. doi: 10.1007/978-3-031-45468-4.

[3] J. Han, M. Kamber, J. Pei, and H. Tong, Data Mining: Concepts and Techniques, 4th ed. Morgan Kaufmann, 2022.

[4] K. P. Murphy, Probabilistic Machine Learning: An Introduction. Cambridge, MA: MIT Press, 2022.

[5] G. James, D. Witten, T. Hastie, R. Tibshirani, and J. Taylor, An Introduction to Statistical Learning: with Applications in Python. Springer, 2023.

[6] Kementerian Kesehatan Republik Indonesia, Buku Saku Hasil Survei Status Gizi Indonesia (SSGI) Tahun 2023. Jakarta: Badan Kebijakan Pembangunan Kesehatan, 2023.

[7] Kementerian Kesehatan Republik Indonesia, Peraturan Menteri Kesehatan Republik Indonesia Nomor 2 Tahun 2020 tentang Standar Antropometri Anak. Jakarta: Kemenkes RI, 2020.

[8] Kementerian Kesehatan Republik Indonesia, Pedoman Pemantauan Pertumbuhan Anak. Jakarta: Kemenkes RI, 2020.

[9] World Health Organization (WHO), WHO Child Growth Standards: Length/height-for-age, weight-for-age, weight-for-length, weight-for-height and body mass index-for-age: Methods and development. Geneva: WHO Press, 2006.

[10] S. Almatsier, Prinsip Dasar Ilmu Gizi. Jakarta: Gramedia, 2011.

[11] A. Kukkar, R. Mohana, A. Kumar Singh, and P. Kumar, “Machine learning based quantitative approach for robust healthcare prediction system,” Journal of Ambient Intelligence and Humanized Computing, vol. 11, no. 12, pp. 6167–6183, 2020.

[12] I. H. Sarker, “Machine learning: Algorithms, real-world applications and research directions,” SN Computer Science, vol. 2, no. 3, p. 160, 2021, doi: 10.1007/s42979-021-00592-x.

[13] T. D. Pham, J. C. Ho, and H. Q. Nguyen, “Predicting stroke using machine learning algorithms,” Journal of Healthcare Engineering, vol. 2020, p. 9151980, 2020, doi: 10.1155/2020/9151980.

[14] K. Taunk, S. De, S. Verma, and A. Swetapadma, “A brief review of machine learning and ensembles based prediction for diabetes mellitus,” Multimedia Tools and Applications, vol. 80, no. 28–29, pp. 35617–35667, 2021, doi: 10.1007/s11042-020-10348-x.

[15] G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “KNN model-based approach for classification,” in Rough Sets: International Joint Conference, IJCRS 2020, Havana, Cuba, September 29 – October 2, 2020, Proceedings, Part II, A. Bi, S. F. Qin, Y. Y. Yao, and J. T. Yao, Eds. Springer, 2020, pp. 986–997.

[16] D. Chicco, N. Tötsch, and G. Jurman, “The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation,” BioData Mining, vol. 14, no. 1, p. 1, 2021, doi: 10.1186/s13040-021-00244-z.

[17] C. G. Walsh, B. N. Ribeiro, S. Dey, S. B. Kritchevsky, and D. B. Reuben, “Improving reporting standards for machine learning prognostic models in older adults: a multidisciplinary consensus statement,” Journal of the American Geriatrics Society, vol. 69, no. 9, pp. 2672–2680, 2021, doi: 10.1111/jgs.17268.

[18] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002, doi: 10.1613/jair.953.

[19] C. Fannany and P. H. Gunawan, “Analisis Klasifikasi Pembelajaran Mesin untuk Pencegahan Proaktif Stunting Anak di Bojongsoang: Sebuah Studi Komparatif,” 2024.

[20] F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data imbalance in classification: Experimental evaluation,” Information Sciences, vol. 513, pp. 429–441, 2020, doi: 10.1016/j.ins.2019.11.004.

[21] P. Vuttipittayamongkol, E. Elyan, and C. Jayne, “A review of data sampling and data augmentation for a deep learning model for medical image classification,” Electronics, vol. 10, no. 3, p. 359, 2021, doi: 10.3390/electronics10030359.

[22] W. C. Wahyudin, F. M. Hana, dan A. Prihandono, “Prediksi Stunting pada Balita di Rumah Sakit Kota Semarang Menggunakan Naive Bayes,” Jurnal Ilmu Komputer dan Matematika, hlm. 32–36, 2023.

[23] I. C. R. Drajana dan A. Bode, “Prediksi Status Penderita Stunting Pada Balita Provinsi Gorontalo Menggunakan K-Nearest Neighbor Berbasis Seleksi Fitur Chi Square,” Jurnal Nasional Komputasi dan Teknologi Informasi (JNKTI), vol. 5, no. 2, hlm. 163-170, 2022.

[24] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Pearson, 2020.

[25] I. H. Witten, E. Frank, M. A. Hall, C. J. Pal, and J. R. Foulds, Data Mining: Practical Machine Learning Tools and Techniques, 5th ed. Morgan Kaufmann, 2025.

[26] Z. H. Zhou, Ensemble Methods: Foundations and Algorithms, 2nd ed. Chapman and Hall/CRC, 2025.

Downloads

Published

2025-10-16

How to Cite

[1]
R. Fauzan Mu'taz and A. Rosita, “Comparison of KNN and Naïve Bayes Classification Algorithms for Predicting Stunting in Toddlers in Banjaran District”, JAIC, vol. 9, no. 5, pp. 2711–2717, Oct. 2025.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.