Comparison of Support Vector Machine (SVM) and Random Forest (RF) Algorithm Performance with Random Undersampling Technique to Predict Gestational Diabetes Mellitus Risk

Authors

  • Annisa Damayanti Universitas Amikom Yogyakarta
  • Anna Baita Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.30871/jaic.v9i2.9009

Keywords:

Gestational Diabetes Mellitus, Prediction, Random Forest, Random Undersampling, Support Vector Machine

Abstract

Gestational Diabetes Mellitus (GDM) is a condition of glucose intolerance that develops during pregnancy until the birth process, which is characterized by an abnormal increase in blood sugar levels. Accurate early diagnosis is very important to provide information that can accelerate the treatment process and reduce complications in the mother and baby. One of the machine learning methods that can be used to predict GDM is the Support Vector Machine (SVM) algorithm and the Random Forest (RF) algorithm. This study aims to compare, and evaluate GDM disease prediction models using the SVM and RF algorithms by balancing the target data using the Random Undersampling Technique. The approach using the random undersampling technique managed to increase accuracy by 18% from the accuracy before using the random undersampling technique. The SVM model in this study also uses hyperparameter tuning with kernel parameters, C (cost), and gamma, while the RF model uses Scoring Metrix and four other parameters, namely N_estimators, max_depth, min_samples_split, and min_samples_leaf. The best parameter search process is carried out using GridSearchCV on both models. The results of the study showed that the SVM classification model with random undersampling technique and hyperparameter tuning with K-Fold achieved an average accuracy of 100% with precision, recall, f1-score values also reaching 100%, with the Best Parameter Kernel Linear, C value = 0.1 and gamma value = 0.001 reaching the highest accuracy of 1.0, with a ROC-AUC value of 99% indicating very good prediction performance. While the RF model showed an accuracy result of 99%, tuning was also carried out using the appropriate parameters resulting in the same accuracy of 99%, with a ROC-AUC value of 99% as well. From both models, it shows that the SVM and RF algorithms have very good prediction performance in predicting DMG, but the SVM algorithm can predict DMG better than RF because the number of prediction errors is lower. 

Downloads

Download data is not yet available.

References

[1] F. K. Adli, “Diabetes Melitus Gestasional : Diagnosis dan Faktor Risiko,” Jurnal Medika Hutama, vol. 03, no. 01, pp. 1545–1551, 2021.

[2] D. Hardianto, “Telaah Komprehensif Diabetes Melitus: Klasifikasi, Gejala, Diagnosis, Pencegahan, Dan Pengobatan,” Jurnal Bioteknologi & Biosains Indonesia (JBBI), vol. 7, no. 2, pp. 304–317, 2021, doi: 10.29122/jbbi.v7i2.4209.

[3] I. D. Federation, “International Diabetes Federation,” International Diabetes Federation, 2024.

[4] H. S. W. Hovi, A. Id Hadiana, and F. Rakhmat Umbara, “Prediksi Penyakit Diabetes Menggunakan Algoritma Support Vector Machine (SVM),” Informatics and Digital Expert (INDEX), vol. 4, no. 1, pp. 40–45, 2022, doi: 10.36423/index.v4i1.895.

[5] W. H. Organization, “https://iris.who.int/bitstream/handle/10665/376869/9789240094703-eng.pdf ,” World Health Organization, no. Diabetes, p. 68, 2024.

[6] Rianti Nurpalah, Meti Kusmiati, Meri Meri, Hendro Kasmanto, and Dina Ferdiani, “Deteksi Dini Diabetes Melitus Gestasional (Dmg) Melalui Pemeriksaan Glukosa Darah Sebagai Upaya Pencegahan Komplikasi Pada Ibu Hamil,” J-ABDI: Jurnal Pengabdian kepada Masyarakat, vol. 2, no. 9, pp. 6425–6432, 2023, doi: 10.53625/jabdi.v2i9.4880.

[7] American Pregnancy Association, “https://americanpregnancy.org/American Pregnancy Association, 2024.

[8] E. N. Simanjuntak, D. Irmayani, and F. A. Nasution, “Tinjauan Penerapan Kecerdasan Buatan Dalam Keamanan Jaringan Tantangan Dan Prospek Masa Depan,” Jurnal Ilmu Komputer dan Sistem Informasi (JIKOMSI), vol. 7, no. 2, pp. 370–375, 2024.

[9] R. G. Wardhana, G. Wang, and F. Sibuea, “Penerapan Machine Learning Dalam Prediksi Tingkat Kasus Penyakit Di Indonesia,” Journal of Information System Management (JOISM), vol. 5, no. 1, pp. 40–45, 2023, doi: 10.24076/joism.2023v5i1.1136.

[10] A. M. Ridwan and G. D. Setyawan, “Perbandingan Berbagai Model Machine Learning Untuk Mendeteksi Diabetes,” Teknokom, vol. 6, no. 2, pp. 127–132, 2023, doi: 10.31943/teknokom.v6i2.152.

[11] P. D. Rinanda, B. Delvika, S. Nurhidayarnis, N. Abror, and A. Hidayat, “Perbandingan Klasifikasi Antara Naive Bayes dan K-Nearest Neighbor Terhadap Resiko Diabetes pada Ibu Hamil,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 2, no. 2, pp. 68–75, 2022, doi: 10.57152/malcom.v2i2.432.

[12] D. Diana Dewi, N. Qisthi, S. S. S. Lestari, and Z. H. S. Putri, “Perbandingan Metode Neural Network Dan Support Vector Machine Dalam Klasifikasi Diagnosa Penyakit Diabetes,” Cerdika: Jurnal Ilmiah Indonesia, vol. 3, no. 09, pp. 828–839, 2023, doi: 10.59141/cerdika.v3i09.662.

[13] C. Aldama and M. Nasir, “Klasifikasi Penyakit Diabetes Menggunakan Metode Support Vector Machine Pada Rumah Sakit Umum Prabumulih,” Jurnal Ilmiah Betrik, vol. 14, no. 02, pp. 376–383, 2023, [Online]. Available: https://ejournal.pppmitpa.or.id/index.php/betrik/article/view/117

[14] G. Sanhaji, A. Febrianti, and F. Teknik, “Aplikasi DIATECT Untuk Prediksi Penyakit Diabetes Menggunakan SVM Berbasis Web,” Jurnal TEKNO KOMPAK, vol. 18, no. 1, pp. 150–163, 2024.

[15] R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” 2020 11th International Conference on Information and Communication Systems, ICICS 2020, pp. 243–248, 2020, doi: 10.1109/ICICS49469.2020.239556.

[16] D. H. Jeong, S. E. Kim, W. H. Choi, and S. H. Ahn, “A Comparative Study on the Influence of Undersampling and Oversampling Techniques for the Classification of Physical Activities Using an Imbalanced Accelerometer Dataset,” Healthcare (Switzerland), vol. 10, no. 7, 2022, doi: 10.3390/healthcare10071255.

[17] V. Yolanda, I. Cholissodin, and P. P. Adikara, “Klasifikasi Diagnosis Penyakit Diabetes Gestasional pada Ibu Hamil menggunakan Algoritme Neighbor Weighted K-Nearest Neighbor ( NWKNN ),” jurnal pengembangan teknologi informasi dan Ilmu Komputer, vol. 5, no. 4, pp. 1310–1321, 2021.

[18] A. Simanjuntak and M. S. Hasibuan, “Application of PCA and K-Means Clustering Methods to Identify Diabetes Mellitus Patient Groups Based on Risk Factors,” Prisma Sains : Jurnal Pengkajian Ilmu dan Pembelajaran Matematika dan IPA IKIP Mataram, vol. 11, no. 4, p. 1002, 2023, doi: 10.33394/j-ps.v11i4.9263.

[19] E. Saputro and D. Rosiyadi, “Penerapan Metode Random Over-Under Sampling Pada Algoritma Klasifikasi Penentuan Penyakit Diabetes,” Bianglala Informatika, vol. 10, no. 1, pp. 42–47, 2022, doi: 10.31294/bi.v10i1.11739.

[20] H. S. W. Hovi, A. Id Hadiana, and F. Rakhmat Umbara, “Prediksi Penyakit Diabetes Menggunakan Algoritma Support Vector Machine (SVM),” Informatics and Digital Expert (INDEX), vol. 4, no. 1, pp. 40–45, 2022, doi: 10.36423/index.v4i1.895.

[21] J. Nazreen, “Diabetes Health Indicators Dataset,” Kaggle. [Online]. Available: https://www.kaggle.com/datasets/julnazz/diabetes-health-indicators-dataset

[22] A. M. Priyatno and T. Widiyaningtyas, “a Systematic Literature Review: Recursive Feature Elimination Algorithms,” JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer), vol. 9, no. 2, pp. 196–207, 2024, doi: 10.33480/jitk.v9i2.5015.

[23] X. W. Liang, A. P. Jiang, T. Li, Y. Y. Xue, and G. T. Wang, “LR-SMOTE — An improved unbalanced data set oversampling based on K-means and SVM,” Knowl Based Syst, vol. 196, 2020, doi: 10.1016/j.knosys.2020.105845.

[24] C. C. Olisah, L. Smith, and M. Smith, “Computer Methods and Programs in Biomedicine Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective,” Comput. Methods Programs Biomed., vol. 220, p. 106773, 2022, doi: 10.1016/j.cmpb.2022.106773.

[25] B. R. Prasetyo et al., “MODEL DIABETES,” JITET J. Inform. dan Tek. Elekro Terap., vol. 12, no. 3, 2024.

Downloads

Published

2025-03-12

How to Cite

[1]
A. Damayanti and A. Baita, “Comparison of Support Vector Machine (SVM) and Random Forest (RF) Algorithm Performance with Random Undersampling Technique to Predict Gestational Diabetes Mellitus Risk”, JAIC, vol. 9, no. 2, pp. 328–337, Mar. 2025.

Issue

Section

Articles