Improving the Accuracy of Obesity Classification Using a Stacking Classifier on Imbalanced Data with SMOTE

Authors

  • Sifa Sari Universitas Dian Nuswantoro
  • M.Arief Soeleman Universitas Dian Nuswantoro
  • Mamay Maida Universitas Dian Nuswantoro
  • Hestiana Putri Novitasari Universitas Dian Nuswantoro

DOI:

https://doi.org/10.30871/jaic.v10i1.11928

Keywords:

Obesity Classification, Machine Learning, Stacking Classifier, SMOTE, Tuning Hyperparameter

Abstract

Overweight continues to be a prevalent public health problem related to lifestyle behavior, eating behaviour and physical activity. The aim of this work is to develop a generalized and robust machine learning model having a high accuracy for categorizing obesity-level. The study applies to the Obesity Dataset with 1610 members and some preprocessing methods such selected data cleaning, categorical attributes transformation, train/test data set split and class imbalance under utilization of SMOTE approach. The modeling process is based on two base learners namely an optimized Random Forest and Gaussian Naïve Bayes that are fused by Stacking Classifier while using Logistic Regression as the meta-model. Experimental results show that the performance of stacking is the best where it obtains an accuracy rate of 86.34%, outperforming each single model. The analysis also reveals enhancements of various classification measures: stacking can indeed model complex non-linear dependencies between instances as well as simple linear ones. In general, the results serve to demonstrate that stacking-based ensemble learning is a strong solution for predicting obesity level and holds promise against early risk detection in preventive health care systems.

Downloads

Download data is not yet available.

References

[1] Prakoso, R. N., Rochim, S. I., Subarnas, A., & Kurniawan, M. E. (2025). Perbandingan Algoritma Naïve Bayes Dan Random Forest Dalam Klasifikasi Obesitas Berdasarkan Faktor Gaya Hidup. Journal of Information Engineering and Educational Technology, 9(1), 11–18. https://doi.org/10.26740/jieet.v9n1.p11-18

[2] Dwi, E., Aini, N., Khasanah, R. A., Ristyawan, A., Diniati, E., Nusantara, U., & Kediri, P. (2024). Penggunaan Data Mining untuk Prediksi tingkat Obesitas di Meksiko Menggunakan Metode Random Forest. In Agustus (Vol. 8). Online.

[3] Maryani, I., & Irmayansyah, I. (2023). Penerapan Algoritma Naïve Bayes Untuk Penentuan Diagnosa Obesitas Pada Peserta Sosialisasi Deteksi Dini Penyakit Tidak Menular (PTM). TeknoIS : Jurnal Ilmiah

[4] Saraswati, S. K., Rahmaningrum, F. D., Pahsya, M. N. Z., Paramitha, N., Wulansari, A., Ristantya, A. R., Sinabutar, B. M., Pakpahan, V. E., & Nandini, N. (2021). Literature Review : Faktor Risiko Penyebab Obesitas. MEDIA KESEHATAN MASYARAKAT INDONESIA, 20(1), 70–74. https://doi.org/10.14710/mkmi.20.1.70-74

[5] Emilia Sukmawati, C., Fitri Nur Masruriyah, A., Ratna Juwita, A., Damaiarta Tejayanda, R., Nurmayanti, T., Korespondensi, P., & Buana Perjuangan Karawang Jl Ronggowaluyo, U. H. (2024). Efektivitas algoritma AdaBoost dan XGBoost pada dataset obesitas populasi dewasa. Jambura Journal of Informatics, 6(2), 101–111. https://doi.org/10.37905/jji

[6] Novianti, N., Alkadri, S. P. A., & Fakhruzi, I. (2024). Klasifikasi Penyakit Hipertensi Menggunakan Metode Random Forest. Progresif: Jurnal Ilmiah Komputer, 20(1), 380. https://doi.org/10.35889/progresif.v20i1.1663

[7] Aulia, Y., Andriyansyah, A., Suharjito, S., & Nensi, S. W. (2024). Analisis Prediksi Stroke dengan Membandingkan Tiga Metode Klasifikasi Decision Tree, Naïve Bayes, dan Random Forest. Jurnal Ilmu Komputer Dan Informatika, 3(2), 89–98. https://doi.org/10.54082/jiki.90

[8] K. Dhibi, M. Mansouri, K. Bouzrara, H. Nounou and M. Nounou, "An Enhanced Ensemble Learning-Based Fault Detection and Diagnosis for Grid-Connected PV Systems," in IEEE Access, vol. 9, pp. 155622-155633, 2021, doi: 10.1109/ACCESS.2021.3128749.

[9] Mustaqim, A. Z., Fadil, N. A., & Tyas, D. A. (2023). Artificial Neural Network for Classification Task in Tabular Datasets and Image Processing: A Systematic Literature Review. Jurnal Online Informatika, 8(2), 158–168. https://doi.org/10.15575/join.v8i2.1002

[10] Sofiyah, W., Negara, B. S., Irsyad, M., Iskandar, I., & Yanto, F. (2025). Lung Disease Detection Using Gradient-Weighted Class Activation Mapping (Grad-CAM). Journal of Artificial Intelligence and Software Engineering, 5(2), 720–730. https://doi.org/10.30811/jaise.v5i2.7041

[11] Lutfi, M., Arsanto, A. T., Amrulloh, M. F., & Kulsum, U. (2023). Penanganan Data Tidak Seimbang Menggunakan Hybrid Method Resampling Pada Algoritma Naive Bayes Untuk Software Defect Prediction. INFORMAL: Informatics Journal, 8(2), 119. https://doi.org/10.19184/isj.v8i2.41090

[12] Maharana, K., Mondal, S., & Nemade, B. (2022). A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings, 3(1), 91–99. https://doi.org/10.1016/j.gltp.2022.04.020

[13] Joseph, V. R. (2022). Optimal Ratio for Data Splitting. https://doi.org/10.1002/sam.11583

[14] Diukarev, V., & Starukhin, Y. (2024). Proposed Methods for Preventing Overfitting in Machine Learning and Deep Learning. Asian Journal of Research in Computer Science, 17(10), 85–94. https://doi.org/10.9734/ajrcos/2024/v17i10511

[15] Nofianti, A., Yawan, M. Y., & Nazar, M. A. (2023). Implementasi Data Mining dalam Pengolahan Data Transaksi Toko Sembako Menggunakan Algoritma Apriori (Studi Kasus : Toko Devan Mart). G-Tech: Jurnal Teknologi Terapan, 7(1), 165–173. https://doi.org/10.33379/gtech.v7i1.1962

[16] Algoritma, A., Pada, K., Rapidminer, S., & Ainurrohmah, W. (2021). Akurasi Algoritma Klasifikasi pada Software Rapidminer dan Weka. Prosiding Seminar Nasional Matematika, 4, 493–499. https://journal.unnes.ac.id/sju/index.php/prisma/

[17] Ekin Adhi Guna, M. Davin Diza Ghifary, Esra Fransiska Sihombing, & Age Pius Datubara. (2023). Implementasi Algoritma Decision Tree untuk Klasifikasi Data Evaluation Car Menggunakan Python. Jurnal Sistem Informasi Dan Ilmu Komputer, 1(4), 167–177. https://doi.org/10.59581/jusiik-widyakarya.v1i4.1830

[18] Mahmuda, S. (2024). Implementasi Metode Random Forest pada Kategori Konten Kanal Youtube. JURNAL JENDELA MATEMATIKA, 2(01), 21–31. https://doi.org/10.57008/jjm.v2i01.633

[19] Alvina Felicia Watratan, Arwini Puspita. B, & Dikwan Moeis. (2020). Implementasi Algoritma Naive Bayes Untuk Memprediksi Tingkat Penyebaran Covid-19 Di Indonesia. Journal of Applied Computer Science and Technology, 1(1), 7–14. https://doi.org/10.52158/jacost.v1i1.9

[20] Saputro, M. B., & Alamsyah, A. (2024). Comparison of Naive Bayes Classifier and K-Nearest Neighbor Algorithms with Information Gain and Adaptive Boosting for Sentiment Analysis of Spotify App Reviews. Recursive Journal of Informatics, 2(1), 37–44. https://doi.org/10.15294/rji.v2i1.68551

[21] Sari, P. W. S., Firmansyah, F., & Kadafi, A. R. K. (2025). Perbandingan Algoritma Random Forest Dan Naïve Bayes Dalam Menganalisis Sentimen Ulasan Pada Produk Skincare Lokal Di Media Sosial Tiktok. Jurnal Informatika Dan Teknik Elektro Terapan, 13(3S1). https://doi.org/10.23960/jitet.v13i3S1.8150

[22] Author, D. F. A. R., & Author, U. C. (2025). Perancangan Sistem Monitoring Dan Manajemen Proyek Pegawai Berbasis Website Dengan Framework Laravel. Jurnal Informatika Dan Teknik Elektro Terapan, 13(3S1). https://doi.org/10.23960/jitet.v13i3S1.7770

[23] Sofiyah, W., Negara, B. S., Irsyad, M., Iskandar, I., & Yanto, F. (2025). Lung Disease Detection Using Gradient-Weighted Class Activation Mapping (Grad-CAM). Journal of Artificial Intelligence and Software Engineering, 5(2), 720–730. https://doi.org/10.30811/jaise.v5i2.7041

[24] Köklü, N., & Sulak, S.A. (2024). Obesity Dataset. Kaggle. Available at: kaggle.com/datasets/suleymansulak/obesity-dataset

Downloads

Published

2026-02-10

How to Cite

[1]
S. Sari, M. Soeleman, M. Maida, and H. P. Novitasari, “Improving the Accuracy of Obesity Classification Using a Stacking Classifier on Imbalanced Data with SMOTE”, JAIC, vol. 10, no. 1, pp. 946–954, Feb. 2026.

Most read articles by the same author(s)

Similar Articles

<< < 3 4 5 6 7 > >> 

You may also start an advanced similarity search for this article.