Comparison of KNN and Naïve Bayes Classification Algorithms for Predicting Stunting in Toddlers in Banjaran District
DOI:
https://doi.org/10.30871/jaic.v9i5.9703Keywords:
Stunting, Machine Learning, Classification, Naïve Bayes, K-Nearest Neighbors (KNN), Data Pre-processing, SMOTEAbstract
Stunting is a chronic nutritional problem that seriously impacts child growth and development. This study aims to compare the performance of the Naïve Bayes and K-Nearest Neighbors (KNN) algorithms in predicting stunting in toddlers in Banjaran District. The dataset consists of 12,000 toddler data points with three main features: age, gender, and height. The research employed a quantitative approach by applying machine learning algorithms. The SMOTE oversampling technique was applied only to the training data to avoid data leakage, and 5-fold cross-validation was used. A K-value of 3 was selected for the final KNN model based on validation curve analysis to prevent overfitting. The results show that KNN significantly outperformed Naïve Bayes across all evaluation metrics. The Naïve Bayes model yielded an accuracy of 67.50%, precision of 50.87%, recall of 61.38%, F1-score of 55.63%, specificity of 70.54%, and an AUC score of 75.71%. Meanwhile, the KNN (K=3) model achieved an accuracy of 99.11%, precision of 98.08%, recall of 99.25%, F1-score of 98.66%, specificity of 99.03%, and an AUC score of 99.65%. The performance difference between the two models was confirmed by McNemar's Test with a p-value < 0.05, indicating a statistically significant difference. The low performance of Naïve Bayes was attributed to the violation of the feature independence assumption, particularly the high correlation between age and height (r ≈ 0.87). In conclusion, KNN is the more appropriate algorithm for stunting prediction on this dataset. However, the limitation of features suggests the need for further research with additional variables and external validation before wider-scale implementation.
Downloads
References
[1] E. Alpaydin, Introduction to machine learning, 4th ed. MIT Press, 2020.
[2] C. M. Bishop and H. Bishop, Deep Learning: Foundations and Concepts, 1st ed. Springer, 2023. doi: 10.1007/978-3-031-45468-4.
[3] J. Han, M. Kamber, J. Pei, and H. Tong, Data Mining: Concepts and Techniques, 4th ed. Morgan Kaufmann, 2022.
[4] K. P. Murphy, Probabilistic Machine Learning: An Introduction. Cambridge, MA: MIT Press, 2022.
[5] G. James, D. Witten, T. Hastie, R. Tibshirani, and J. Taylor, An Introduction to Statistical Learning: with Applications in Python. Springer, 2023.
[6] Kementerian Kesehatan Republik Indonesia, Buku Saku Hasil Survei Status Gizi Indonesia (SSGI) Tahun 2023. Jakarta: Badan Kebijakan Pembangunan Kesehatan, 2023.
[7] Kementerian Kesehatan Republik Indonesia, Peraturan Menteri Kesehatan Republik Indonesia Nomor 2 Tahun 2020 tentang Standar Antropometri Anak. Jakarta: Kemenkes RI, 2020.
[8] Kementerian Kesehatan Republik Indonesia, Pedoman Pemantauan Pertumbuhan Anak. Jakarta: Kemenkes RI, 2020.
[9] World Health Organization (WHO), WHO Child Growth Standards: Length/height-for-age, weight-for-age, weight-for-length, weight-for-height and body mass index-for-age: Methods and development. Geneva: WHO Press, 2006.
[10] S. Almatsier, Prinsip Dasar Ilmu Gizi. Jakarta: Gramedia, 2011.
[11] A. Kukkar, R. Mohana, A. Kumar Singh, and P. Kumar, “Machine learning based quantitative approach for robust healthcare prediction system,” Journal of Ambient Intelligence and Humanized Computing, vol. 11, no. 12, pp. 6167–6183, 2020.
[12] I. H. Sarker, “Machine learning: Algorithms, real-world applications and research directions,” SN Computer Science, vol. 2, no. 3, p. 160, 2021, doi: 10.1007/s42979-021-00592-x.
[13] T. D. Pham, J. C. Ho, and H. Q. Nguyen, “Predicting stroke using machine learning algorithms,” Journal of Healthcare Engineering, vol. 2020, p. 9151980, 2020, doi: 10.1155/2020/9151980.
[14] K. Taunk, S. De, S. Verma, and A. Swetapadma, “A brief review of machine learning and ensembles based prediction for diabetes mellitus,” Multimedia Tools and Applications, vol. 80, no. 28–29, pp. 35617–35667, 2021, doi: 10.1007/s11042-020-10348-x.
[15] G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “KNN model-based approach for classification,” in Rough Sets: International Joint Conference, IJCRS 2020, Havana, Cuba, September 29 – October 2, 2020, Proceedings, Part II, A. Bi, S. F. Qin, Y. Y. Yao, and J. T. Yao, Eds. Springer, 2020, pp. 986–997.
[16] D. Chicco, N. Tötsch, and G. Jurman, “The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation,” BioData Mining, vol. 14, no. 1, p. 1, 2021, doi: 10.1186/s13040-021-00244-z.
[17] C. G. Walsh, B. N. Ribeiro, S. Dey, S. B. Kritchevsky, and D. B. Reuben, “Improving reporting standards for machine learning prognostic models in older adults: a multidisciplinary consensus statement,” Journal of the American Geriatrics Society, vol. 69, no. 9, pp. 2672–2680, 2021, doi: 10.1111/jgs.17268.
[18] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002, doi: 10.1613/jair.953.
[19] C. Fannany and P. H. Gunawan, “Analisis Klasifikasi Pembelajaran Mesin untuk Pencegahan Proaktif Stunting Anak di Bojongsoang: Sebuah Studi Komparatif,” 2024.
[20] F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data imbalance in classification: Experimental evaluation,” Information Sciences, vol. 513, pp. 429–441, 2020, doi: 10.1016/j.ins.2019.11.004.
[21] P. Vuttipittayamongkol, E. Elyan, and C. Jayne, “A review of data sampling and data augmentation for a deep learning model for medical image classification,” Electronics, vol. 10, no. 3, p. 359, 2021, doi: 10.3390/electronics10030359.
[22] W. C. Wahyudin, F. M. Hana, dan A. Prihandono, “Prediksi Stunting pada Balita di Rumah Sakit Kota Semarang Menggunakan Naive Bayes,” Jurnal Ilmu Komputer dan Matematika, hlm. 32–36, 2023.
[23] I. C. R. Drajana dan A. Bode, “Prediksi Status Penderita Stunting Pada Balita Provinsi Gorontalo Menggunakan K-Nearest Neighbor Berbasis Seleksi Fitur Chi Square,” Jurnal Nasional Komputasi dan Teknologi Informasi (JNKTI), vol. 5, no. 2, hlm. 163-170, 2022.
[24] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Pearson, 2020.
[25] I. H. Witten, E. Frank, M. A. Hall, C. J. Pal, and J. R. Foulds, Data Mining: Practical Machine Learning Tools and Techniques, 5th ed. Morgan Kaufmann, 2025.
[26] Z. H. Zhou, Ensemble Methods: Foundations and Algorithms, 2nd ed. Chapman and Hall/CRC, 2025.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Rivaldy Fauzan Mu'taz, Ai Rosita

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








