Comparison of Support Vector Machine and Decision Tree Algorithm Performance with Undersampling Approach in Predicting Heart Disease Based on Lifestyle

Authors

  • Gusti Ayu Putu Febriyanti Universitas Amikom Yogyakarta
  • Anna Baita Universitas Amikom Yogyakarta

Keywords:

Cardiac, Decision Tree, K-Fold, Support Vector Machine, Prevention

Abstract

Heart disease is one of the leading causes of death in the world with risk factors such as atherosclerosis, high blood pressure, and smoking. Early diagnosis is essential to reduce mortality and improve patients' quality of life. This study evaluates the performance of two machine learning algorithms, namely Support Vector Machine (SVM) and Decision Tree (DT), in predicting heart disease risk by applying undersampling techniques to handle data imbalance. The K-fold cross-validation method with K=10 and hyperparameter tuning were applied to obtain the optimal performance of both models. The results showed that SVM without undersampling achieved 92% accuracy, while with undersampling the accuracy decreased to 76%. DT without undersampling has 91% accuracy, while with undersampling the accuracy reaches 75%. The undersampling technique successfully improved the balance in recognizing minority classes, although it reduced the overall accuracy. This finding confirms that SVM is more reliable in predicting heart disease in datasets with unbalanced class distribution.

Downloads

Download data is not yet available.

References

[1] “10 penyebab kematian teratas,” World Health Organization. Accessed: Oct. 19, 2024. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death

[2] “Penyakit Kardiovaskular (PKV),” World Health Organization.

[3] W. Hanifah, W. Septi Oktavia, and dan Hoirun Nisa, “Faktor gaya Hidup dan Penyakit Jantung Koroner: Review Sistematik Pada Orang Dewasa di Indonesia (Lifestyle Factors and Coronary Heart Disease: A Systematic Review Among Indonesian Adults)),” vol. 44, no. 1, pp. 45–58, 2021.

[4] M. M. Ali, B. K. Paul, K. Ahmed, F. M. Bui, J. M. W. Quinn, and M. A. Moni, “Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison,” Comput Biol Med, vol. 136, Sep. 2021, doi: 10.1016/j.compbiomed.2021.104672.

[5] J. Khatib Sulaiman, A. A. Mizwar Rahim, I. Yanuar Risca Pratiwi, M. Ainul Fikri, and U. Amikom Yogyakarta, “Klasifikasi Penyakit Jantung Menggunakan Metode Synthetic Minority Over-Sampling Technique Dan Random Forest Clasifier,” Indonesian Journal of Computer Science Attribution, vol. 12, no. 5, pp. 2023–2995.

[6] A. Arifuddin, G. S. Buana, R. A. Vinarti, and A. Djunaidy, “Performance Comparison of Decision Tree and Support Vector Machine Algorithms for Heart Failure Prediction,” Procedia Comput Sci, vol. 234, pp. 628–636, 2024, doi: 10.1016/j.procs.2024.03.048.

[7] A. Indrawati, “Penerapan Teknik Kombinasi Oversampling dan Undersampling Untuk Mengatasi Permasalahan Imbalanced Dataset,” Jurnal Informatika dan Komputer) Akreditasi KEMENRISTEKDIKTI, vol. 4, no. 1, 2021, doi: 10.33387/jiko.

[8] A. Silvanie and D. S. Permana, “Prediksi Penyakit Jantung Menggunakan Support Vector Machine dan Python Pada Basis Data Pasien di Cleveland,” DKI Jakarta, Apr. 2021.

[9] N. Yudistira and A. F. Putra, “Algoritma Decision Tree Dan Smote Untuk Klasifikasi Serangan Jantung Miokarditis Yang Imbalance,” Jurnal Litbang Edusaintech, vol. 2, no. 2, pp. 112–122, Dec. 2021, doi: 10.51402/jle.v2i2.48.

[10] A. Putranto, N. L. Azizah, I. Ratna, I. Astutik, F. Sains, and D. Teknologi, “Sistem Prediksi Penyakit Jantung Berbasis Web Menggunakan Metode SVM dan Framework Streamlit,” Sidoarjo Indonesia, Apr. 2023. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/heart+disease

[11] I. M. Agus Oka Gunawan, I. D. A. Indah Saraswati, I. D. G. Riswana Agung, and I. P. Eka Putra, “Klasifikasi Penyakit Jantung Menggunakan Algoritma Decision Tree Series C4.5 Dengan Rapidminer,” Jurnal Teknologi Dan Sistem Informasi Bisnis, vol. 5, no. 2, pp. 73–83, Apr. 2023, doi: 10.47233/jteksis.v5i2.775.

[12] A. H. Yusufi, A. Kharisma, A. D. Adinata, D. F. Ramzy, and M. M. Santoni, “Prediksi Resiko Kematian Pada Penderita Penyakit Kadiovaskular Menggunakan Metode Ensemble Learning,” 2022.

[13] Luye Zhang, “Heart_2020_cleaned,” kaggle. Accessed: Oct. 26, 2024. [Online]. Available: https://www.kaggle.com/datasets/luyezhang/heart-2020-cleaned/data

[14] S. N. N. Arif, A. M. Siregar, S. Faisal, and A. R. Juwita, “Klasifikasi Penyakit Serangan Jantung Menggunakan Metode Machine Learning K-Nearest Neighbors (KNN) dan Support Vector Machine (SVM),” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 8, no. 3, p. 1617, Jul. 2024, doi: 10.30865/mib.v8i3.7844.

[15] S. P. R. Yulianto, A. Z. Fanani, A. Affandy, and M. I. Aziz, “Analisis Metode Smoote pada Klasifikasi Penyakit Jantung Berbasis Random Forest Tree,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 8, no. 3, p. 1460, Jul. 2024, doi: 10.30865/mib.v8i3.7712.

[16] R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” in 2020 11th International Conference on Information and Communication Systems, ICICS 2020, Institute of Electrical and Electronics Engineers Inc., Apr. 2020, pp. 243–248. doi: 10.1109/ICICS49469.2020.239556.

[17] M. Koziarski, “CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification,” Apr. 2020, [Online]. Available: http://arxiv.org/abs/2004.03409

[18] L. Qadrini, H. Hikmah, and M. Megasari, “Oversampling, Undersampling, Smote SVM dan Random Forest pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017,” Journal of Computer System and Informatics (JoSYC), vol. 3, no. 4, pp. 386–391, Sep. 2022, doi: 10.47065/josyc.v3i4.2154.

[19] A. Bansal and A. Jain, “Analysis of focussed under-sampling techniques with machine learning classifiers,” in 2021 IEEE/ACIS 19th International Conference on Software Engineering Research, Management and Applications, SERA 2021, Institute of Electrical and Electronics Engineers Inc., 2021, pp. 91–96. doi: 10.1109/SERA51205.2021.9509270.

[20] Y. A. Sir and A. H. H. Soepranoto, “Pendekatan Resampling Data Untuk Menangani Masalah Ketidakseimbangan Kelas,” Jurnal Komputer dan Informatika, vol. 10, no. 1, pp. 31–38, Mar. 2022, doi: 10.35508/jicon.v10i1.6554.

[21] Z. D. E. Maisat and A. Fauzan Dianta, “Implementasi Optimasi Hyperparameter GridSearchCV Pada Sistem Prediksi Serangan Jantung Menggunakan SVM,” Teknologi: Jurnal Ilmiah Sistem Informasi, vol. 13, no. 1, pp. 8–15, 2023, doi: 10.26594/teknologi.v13i1.3098.

[22] J. Guo, K. Wang, and S. Jin, “Mapping of Soil pH Based on SVM-RFE Feature Selection Algorithm,” Agronomy, vol. 12, no. 11, Nov. 2022, doi: 10.3390/agronomy12112742.

[23] F. S. Gomiasti, W. Warto, E. Kartikadarma, J. Gondohanindijo, and D. R. I. M. Setiadi, “Enhancing Lung Cancer Classification Effectiveness Through Hyperparameter-Tuned Support Vector Machine,” Journal of Computing Theories and Applications, vol. 1, no. 4, pp. 396–406, Mar. 2024, doi: 10.62411/jcta.10106.

[24] R. Hasan, “Comparative Analysis of Machine Learning Algorithms for Heart Disease Prediction,” ITM Web of Conferences, vol. 40, p. 03007, 2021, doi: 10.1051/itmconf/20214003007.

[25] M. Ozcan and S. Peker, “A classification and regression tree algorithm for heart disease modeling and prediction,” Healthcare Analytics, vol. 3, Nov. 2023, doi: 10.1016/j.health.2022.100130.

[26] M. I. Aziz, A. Z. Fanani, and A. Affandy, “Analisis Metode Ensemble Pada Klasifikasi Penyakit Jantung Berbasis Decision Tree,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 7, no. 1, p. 1, Jan. 2023, doi: 10.30865/mib.v7i1.5169.

[27] Y. A. Ali, E. M. Awwad, M. Al-Razgan, and A. Maarouf, “Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity,” Processes, vol. 11, no. 2, Feb. 2023, doi: 10.3390/pr11020349.

[28] E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” Informatics, vol. 8, no. 4, Dec. 2021, doi: 10.3390/informatics8040079.

[29] D. Ismafillah, T. Rohana, and Y. Cahyana, “Implementasi Model Support Vector Machine dan Logistic Regression Untuk Memprediksi Penyakit Stroke,” Jurnal Riset Komputer), vol. 10, no. 1, pp. 2407–389, 2023, doi: 10.30865/jurikom.v10i1.5478.

Downloads

Published

2025-03-11

How to Cite

[1]
G. A. P. Febriyanti and A. Baita, “Comparison of Support Vector Machine and Decision Tree Algorithm Performance with Undersampling Approach in Predicting Heart Disease Based on Lifestyle”, JAIC, vol. 9, no. 2, pp. 318–327, Mar. 2025.

Issue

Section

Articles

Similar Articles

<< < 1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.