Comparison of Logistic Regression, Random Forest, Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) Algorithms in Diabetes Prediction
DOI:
https://doi.org/10.30871/jaic.v9i5.9815Keywords:
Diabetes Prediction, Logistic Regression, Random Forest, Support Vector Machine, K-Nearest NeighborsAbstract
Diabetes mellitus is a prevalent chronic illness that continues to grow in incidence worldwide, placing significant strain on healthcare systems. The timely prediction of diabetes is crucial for early intervention and management. This study explores the comparative effectiveness of four machine learning algorithms Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN) in identifying diabetes cases using a large public dataset containing 100,000 patient records obtained from open source Kaggle. The dataset includes nine clinical variables, such as age, gender, body mass index (BMI), blood glucose level, and HbA1c levels, among others. To address class imbalance, which showed less than 10% positive (diabetic) cases initially, the Synthetic Minority Oversampling Technique (SMOTE) was applied exclusively to the training data after an 80:20 stratified split. All models were evaluated using 5-fold stratified cross-validation, measuring their performance through accuracy, precision, recall, F1-score, area under the ROC curve (AUC), and training time. Among the models, Random Forest achieved the highest classification accuracy (96.88%) and AUC (99.70%), indicating superior overall performance. Furthermore, McNemar statistical tests revealed that the differences in performance between Random Forest and the other models were statistically significant. An analysis of feature importance highlighted that HbA1c, glucose level, and BMI were the most influential predictors. These results demonstrate that Random Forest offers the most balanced combination of accuracy, interpretability, and robustness, making it highly suitable for real-world clinical screening scenarios where early detection of diabetes is critical.
Downloads
References
[1] M. Saputra, J. P. Sidabuke, R. P. Sinulingga, R. B. Tamba, F. Sains, and D. Teknologi, “Analisis Metode Algoritma K-Nearest Neighbor (KNN) Dan Naive Bayes Untuk Klasifikasi Diabetes Mellitus,” Jurnal TEKINKOM, vol. 6, no. 2, p. 2023, 2023, doi: 10.37600/tekinkom.v6i2.942.
[2] M. Sholeh, D. Andayati, R. Yuliana Rachmawati, P. Studi Informatika, and F. Teknologi Informasi dan Bisnis, “Data Mining Model Klasifikasi Menggunakan Algoritma K-Nearest Neighbor Dengan Normalisasi Untuk Prediksi Penyakit Diabetes Data Mining Model Classification Using Algorithm K-Nearest Neighbor With Normalization For Diabetes Prediction,” 2022.
[3] K. Thaiyalnayaki, “Classification of diabetes using deep learning and svm techniques,” International Journal of Current Research and Review, vol. 13, no. 1, pp. 146–149, Jan. 2021, doi: 10.31782/IJCRR.2021.13127.
[4] A. M. Ridwan and G. D. Setyawan, “Perbandingan Berbagai Model Machine Learning Untuk Mendeteksi Diabetes,” TEKNOKOM, vol. 6, no. 2, pp. 127–132, Aug. 2023, doi: 10.31943/teknokom.v6i2.152.
[5] P. R. Putri and R. Alit, “Klasifikasi Penyakit Diabetes Melitus Menggunakan Metode Support Vector Machine (SVM),” Journal of Informatics and Computer Science, vol. 06, 2024.
[6] K. A. Saputro, E. M. Atsir, and H. Hasanah, “https://ejurnal.methodist.ac.id/index.php/tamika/issue/view/222,” TAMIKA: Jurnal Tugas Akhir Manajemen Informatika & Komputerisasi Akuntansi, vol. 4, no. 2, pp. 159–166, Dec. 2024, doi: 10.46880/tamika.Vol4No2.pp159-166.
[7] N. Nur Muttaqin, “Klasifikasi Penyakit Diabetes Menggunakan Metode Random Forest Dan Adaboost,” 2024.
[8] V. Kant Singh Guru Ghasidas Vishwavidyalaya, M. K. Sahu, N. Dev Yadav, V. Kant Singh Assistant Professor, and M. Sahu Assistant Professor, “A Comparative Analysis Of Svm Kernels For Detection Of Diabetes,” 2022. [Online]. Available: https://www.researchgate.net/publication/363439771
[9] O. M. Haq, A. Ridwan, and T. G. Pratama, “Analisis Perbandingan Kinerja Algoritma Naïve Bayes Dan KNN Untuk Memprediksi Penyakit Diabetes,” Jurnal Ilmiah Komputer, vol. 21, 2025, [Online].
[10] R. Artanto, W. Sujana, I. Made, and A. Agastya, “Application of Machine Learning Algorithm for Osteoporosis Disease Prediction System,” 2024. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
[11] M. Fadli and R. A. Saputra, “Klasifikasi Dan Evaluasi Performa Model Random Forest Untuk Prediksi Stroke Classification And Evaluation Of Performance Models Random Forest For Stroke Prediction,” JT: Jurnal Teknik, vol. 12. , 2023, [Online]. Available: http://jurnal.umt.ac.id/index.php/jt/index
[12] Md. A. R. Refat, M. al Amin, C. Kaushal, Mst. N. Yeasmin, and M. K. Islam, “A Comparative analysis of Early Stage Diabetes Prediction using Machine Learning and Deep learning Approach.” Nov. 01, 2021. doi: 10.36227/techrxiv.16870623.v1.
[13] D. Kurniawan Saputro, M. Fiko Rastio Ajie, S. Azizah, and D. Hartanti, “Penerapan Logistic Regression untuk Mendeteksi Penyakit Jantung pada Pasien,” 2023.
[14] T. Riska Muliani, J. Sumarsono, I. S. Siti Wardatullatifah, P. Studi Teknik Pertanian, and F. Teknologi Pangan dan Agroindustri, “Deteksi Tingkat Kematangan Buah Alpukat (Persea americana Mill.) Menggunakan Algoritma Klasifikasi Dan Metode Stratified K-Fold Cross Validation Detection of Avocado Fruit Ripeness Level Using Classification Algorithm and Stratified K-Fold Cross Validation Method,” 2024. [Online]. Available: https://journal.unram.ac.id/index.php/agent
[15] R. Rizki, R. Athallah, I. Cholissodin, and P. P. Adikara, “Prediksi Potensi Pengidap Penyakit Diabetes berdasarkan Faktor Risiko Menggunakan Algoritme Kernel K-Nearest Neighbor,” 2022. [Online]. Available: http://j-ptiik.ub.ac.id
[16] Muhammad Yusril Aldean, Paradise, and Novanda Alim Setya Nugraha, “16 - Analisis Sentimen Masyarakat Terhadap Vaksinasi Covid-19 di Twitter Menggunakan Metode Random Forest Classifier (Studi Kasus Vaksin Sinovac),” Journal of Informatics, Information System, Software Engineering and Applications, vol. 4, p. .064-072, 2022.
[17] H. Apriyani, “Perbandingan Metode Naïve Bayes Dan Support Vector Machine Dalam Klasifikasi Penyakit Diabetes Melitus,” 2020. [Online]. Available: https://journal-computing.org/index.php/journal-ita/index
[18] R. Andanika Siallagan, “Prediksi Penyakit Diabetes Mellitus Menggunakan Algoritma C4.5,” Jurnal Responsif, vol. 3, no. 1, pp. 44–52, 2021, [Online]. Available: http://ejurnal.ars.ac.id/index.php/jti
[19] B. Andriska, C. Permana, and I. K. Dewi, “Komparasi Metode Klasifikasi Data Mining Decision Tree dan Naïve Bayes Untuk Prediksi Penyakit Diabetes,” Jurnal Informatika dan Teknologi, vol. 4, no. 1, 2021, doi: 10.29408/jit.v4i1.2994.
[20] J. S. Komputer, K. Buatan, and A. Ridwan, “Penerapan Algoritma Naïve Bayes Untuk Klasifikasi Penyakit Diabetes Mellitus,” 2020.
[21] A. Damayanti and A. Baita, "Comparison of Support Vector Machine (SVM) and Random Forest (RF) Algorithm Performance with Random Undersampling Technique to Predict Gestational Diabetes Mellitus Risk," Journal of Applied Informatics and Computing (JAIC), vol. 9, no. 2, pp. 328–337, Apr. 2025. [Online]. Available: https://jurnal.polibatam.ac.id/index.php/JAIC/article/view/9009/2644
[22] B. R. Prasetyo et al., “Model Diabetes,” JITET J. Inform. dan Tek. Elekro Terap., vol. 12, no. 3, 2024
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 M. Fadli Kurniawan, Dyah Ayu Megawaty

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








