Benchmarking Oversampling Strategies to Enhance the Performance of Machine Learning Algorithms in Hypertension Classification

Authors

  • Aenur Hakim Maulia Universitas Dian Nuswantoro
  • Abu Salam Universitas Dian Nuswantoro

DOI:

https://doi.org/10.30871/jaic.v10i1.11917

Keywords:

Data Oversampling, Hypertension classification, Machine Learning, Random Oversampling, Linear Discriminant Analysis

Abstract

This study benchmarks the effectiveness of three oversampling techniques, namely SMOTE, Random Oversampling (ROS), and ADASYN, in enhancing machine learning performance for multiclass hypertension classification. Using key physiological features and four optimized algorithms Logistic Regression, Support Vector Machine, Linear Discriminant Analysis, and Artificial Neural Networks, model performance was assessed using accuracy, F1-macro, and ROC AUC metrics. The experimental results indicate that the combination of SMOTE and Linear Discriminant Analysis (LDA) yields the highest overall performance, achieving an accuracy of 0.9773 and an F1-macro score of 0.9848. Logistic Regression demonstrates optimal results when paired with ROS, also reaching an accuracy of 0.9773. Artificial Neural Networks show the most substantial performance improvement under ADASYN, particularly reflected in higher F1-macro values. Although Support Vector Machine is less sensitive to oversampling interventions, it achieves a strong ROC AUC score of 0.9776 when trained using SMOTE. Overall, the findings confirm that oversampling techniques significantly improve classification performance in multilevel hypertension prediction, with SMOTE combined with LDA emerging as the most effective configuration.

Downloads

Download data is not yet available.

References

[1] World Health Organization, “Hypertension,” Apr. 2024. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/hypertension. [Accessed: 08-Feb-2025].

[2] L. R. Putri, M. Azam, A. A. Nisa, A. I. Fibriana, P. Kanthawee, and S. A. Shabbir, “Prevalence and Risk Factors of Hypertension among Young Adults: An Indonesian Basic Health Survey,” The Open Public Health Journal, vol. 18, no. 1, Jan. 2025, doi: https://doi.org/10.2174/0118749445361291241129094132.

[3] Giuseppe Mancia et al., “2023 ESH Guidelines for the management of arterial hypertension The Task Force for the management of arterial hypertension of the European Society of Hypertension Endorsed by the European Renal Association (ERA) and the International Society of Hypertension (ISH),” Journal of Hypertension, vol. Publish Ahead of Print, no. 12, Jun. 2023, doi: https://doi.org/10.1097/hjh.0000000000003480.

[4] R. Kurniawan et al., “Hypertension prediction using algorithm among Indonesian adults,” IAES International Journal of Artificial Intelligence (IJ-AI), vol. 12, no. 2, 2023, doi: https://doi.org/10.11591/ijai.v12.i2.pp776-784.

[5] S. Lip et al., “ BASED MODELS FOR PREDICTING WHITE-COAT AND MASKED PATTERNS OF BLOOD PRESSURE,” Journal of Hypertension, vol. 39, no. Supplement 1, p. e69, Apr. 2021, doi: https://doi.org/10.1097/01.hjh.0000745092.07595.a5.

[6] S. M. S. Islam et al., “ Approaches for Predicting Hypertension and Its Associated Factors Using Population-Level Data From Three South Asian Countries,” Frontiers in Cardiovascular Medicine, vol. 9, Mar. 2022, doi: https://doi.org/10.3389/fcvm.2022.839379.

[7] N. Novianti, S. P. A. Alkadri, dan I. Fakhruzi, "Klasifikasi Penyakit Hipertensi Menggunakan Metode Random Forest," Progresif: Jurnal Ilmiah Komputer, vol. 20, no. 1, pp. 165-174, Feb. 2024, doi: 10.35889/progresif.v20i1.1663.

[8] J. M. Kurniawan, I. Bagus, S. Yong, and Mediana Aryuni, “Handling Imbalanced Dataset in Online Learning Performance Prediction Using Resampling Techniques,” 2024 7th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), pp. 900–906, Dec. 2024, doi: https://doi.org/10.1109/isriti64779.2024.10963560.

[9] U. Hasanah, A. M. Soleh, and K. Sadik, “Effect of Random Under sampling, Oversampling, and SMOTE on the Performance of Cardiovascular Disease Prediction Models,” Jurnal Matematika, Statistika dan Komputasi, vol. 21, no. 1, pp. 88–102, Sep. 2024, doi: https://doi.org/10.20956/j.v21i1.35552.

[10] M. Khushi et al., “A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data,” IEEE Access, vol. 9, pp. 109960–109975, 2021, doi: https://doi.org/10.1109/access.2021.3102399.

[11] A. J. Albert, R. Murugan, and T. Sripriya, “Diagnosis of heart disease using oversampling methods and decision tree classifier in cardiology,” Research on Biomedical Engineering, Dec. 2022, doi: https://doi.org/10.1007/s42600-022-00253-9.

[12] I. Dey and V. Pratap, “A Comparative Study of SMOTE, Borderline-SMOTE, and ADASYN Oversampling Techniques using Different Classifiers,” 2023 3rd International Conference on Smart Data Intelligence (ICSMDI), Mar. 2023, doi: https://doi.org/10.1109/icsmdi57622.2023.00060.

[13] F. Handayani, “Komparasi Support Vector Machine, Logistic Regression Dan Artificial Neural Network Dalam Prediksi Penyakit Jantung,” Jurnal Edukasi dan Penelitian Informatika (JEPIN), vol. 7, no. 3, p. 329, Dec. 2021, doi: https://doi.org/10.26418/jp.v7i3.48053.

[14] None Dewi Nasien et al., “Perbandingan Implementasi Menggunakan Metode KNN, Naive Bayes, dan Logistik Regression Untuk Mengklasifikasi Penyakit Diabetes,” JEKIN - Jurnal Teknik Informatika, vol. 4, no. 1, pp. 10–17, Feb. 2024, doi: https://doi.org/10.58794/jekin.v4i1.640.

[15] A. Jibril, K. Haruna, and Z. Jiangsheng, “Feature Selection and Parameter Optimization of Support Vector Machine (Svm) and Logistic Regression (Lr) Algorithms Using Particle Swarm Optimization (Pso) In Prediction of Diabetes.,” Journal of Computer Science and Information Technology, vol. 11, no. 1, pp. 21–47, Jun. 2023, doi: https://doi.org/10.15640/jcsit.v11n1a3.

[16] S. Vamshi Kumar, T. V. Rajinikanth, and S. Viswanadha Raju, “Heart Attack Classification Using SVM with LDA and PCA Linear Transformation Techniques,” Algorithms for Intelligent Systems, pp. 99–112, 2021, doi: https://doi.org/10.1007/978-981-33-4046-6_10.

[17] O. Samarina, A. Chuykov, and O. Kovtun, “PREDICTION OF EARLY ONSET OF ARTERIAL HYPERTENSION BY DISCRIMINANT ANALYSIS IN CHILDREN,” Journal of Hypertension, vol. 39, no. Supplement 1, p. e188, Apr. 2021, doi: https://doi.org/10.1097/01.hjh.0000746532.50623.d4.

[18] S. Ebrahimi and H. Farshid, “The efficiency of artificial neural network (ANN) for diagnosis of obesity and hypertension,” Academic Journal of Health Sciences: Medicina Balear202, pp. 47–51, 2023.

[19] R. Muzayanah, D. A. A. Pertiwi, M. Ali, and M. A. Muslim, “Comparison of gridsearchcv and bayesian hyperparameter optimization in random forest algorithm for diabetes prediction,” Journal of Soft Computing Exploration, vol. 5, no. 1, pp. 86–91, Apr. 2024, doi: https://doi.org/10.52465/joscex.v5i1.308.

[20] R. Waluyo and A. S. Munir, “Optimasi Prediksi Kematian pada Gagal Jantung Analisis Perbandingan Algoritma Pembelajaran Ensemble dan Teknik Penyeimbangan Data pada Dataset,” Jurnal Sistem dan Teknologi Informasi (JustIN), vol. 12, no. 2, p. 365, Apr. 2024, doi: https://doi.org/10.26418/justin.v12i2.75158.

[21] M. Lokanan, “Exploring Resampling Techniques in Credit Card Default Prediction,” Research Square (Research Square), Mar. 2024, doi: https://doi.org/10.21203/rs.3.rs-4087259/v1.

[22] G. M. Foody, “Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient,” PLOS ONE, vol. 18, no. 10, pp. e0291908–e0291908, Oct. 2023, doi: https://doi.org/10.1371/journal.pone.0291908.

[23] W. Cullerne Bown, “Sensitivity and Specificity versus Precision and Recall, and Related Dilemmas,” Journal of Classification, vol. 41, no. 2, pp. 402–426, Jun. 2024, doi: https://doi.org/10.1007/s00357-024-09478-y.

[24] A. M. Carrington et al., “Deep ROC Analysis and AUC as Balanced Average Accuracy to Improve Model Selection, Understanding and Interpretation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 1–1, 2022, doi: https://doi.org/10.1109/TPAMI.2022.3145392.

[25] Y. Liang, Z. Chen, G. Liu, and M. Elgendi, “A new, short-recorded photoplethysmogram dataset for blood pressure monitoring in China,” Scientific Data, vol. 5, no. 1, p. 180020, Feb. 2018, doi: https://doi.org/10.1038/sdata.2018.20.

[26] P. Purwono, P. Dewi, S. K. Wibisono, and B. P. Dewa, “MODEL PREDIKSI OTOMATIS JENIS PENYAKIT HIPERTENSI DENGAN PEMANFAATAN ALGORITMA ARTIFICIAL NEURAL NETWORK”, insect, vol. 7, no. 2, pp. 82–90, Mar. 2022..

[27] E. Alshdaifat, D. Alshdaifat, A. Alsarhan, F. Hussein, and S. M. F. S. El-Salhi, “The Effect of Preprocessing Techniques, Applied to Numeric Features, on Classification Algorithms’ Performance,” Data, vol. 6, no. 2, p. 11, Jan. 2021, doi: https://doi.org/10.3390/data6020011.

[28] R. Hidayat, “Implementasi Algoritma Random Forest Regression Untuk Memprediksi Penjualan Produksi di Supermarket”, simkom, vol. 10, no. 1, pp. 101-109, Jan. 2025, doi: https://doi.org/10.51717/simkom.v10i1.703.

[29] Mochamad Gilang Saputra and B. J. Santoso, “Implementasi Feature Selection Menggunakan Boruta untuk Peningkatan Akurasi Model Lapser Prediction,” MALCOM Indonesian Journal of and Computer Science, vol. 5, no. 3, pp. 886–895, Jun. 2025, doi: https://doi.org/10.57152/malcom.v5i3.1992.

[30] Y. Yin, Z. Yuan, and I. M. Tanvir, “Electronic medical records imputation by temporal Generative Adversarial Network,” BioData Mining, vol. 17, p. 19, 2024, doi: 10.1186/s13040-024-00372-2.

[31] T. T. Wu, L. H. Smith, L. M. Vernooij, E. Patel, and J. W. Devlin, “Data Missingness Reporting and Use of Methods to Address It in Critical Care Cohort Studies,” Critical Care Explorations, vol. 5, no. 11, p. e1005, Nov. 2023, doi: https://doi.org/10.1097/CCE.0000000000001005.

[32] O. Alotaibi, E. Pardede, dan S. Tomy, “Cleaning Big Data Streams: A Systematic Literature Review,” *Technologies*, vol. 11, no. 4, p. 101, Jul. 2023, doi: 10.3390/technologies11040101.

[33] D. Dai, Z. Ji, and H. Wang, “Non-Invasive Continuous Blood Pressure Estimation From Single-Channel PPG Based on a Temporal Convolutional Network Integrated With an Attention Mechanism,” Appl. Sci., vol. 14, no. 14, Art. 6061, 2024, doi: 10.3390/app14146061.

[34] L. A. Demidova, “Two-Stage Hybrid Data Classifiers Based on SVM and kNN Algorithms,” Symmetry, vol. 13, no. 4, p. 615, Apr. 2021, doi: https://doi.org/10.3390/sym13040615.

[35] Y. Dimas Pratama and A. Salam, “Comparison of Data Normalization Techniques on KNN Classification Performance for Pima Indians Diabetes Dataset,” Journal of Applied Informatics and Computing, vol. 9, no. 3, pp. 693–706, Jun. 2025, doi: https://doi.org/10.30871/jaic.v9i3.9353.

[36] M. R. Firmansyah, “Stroke Classification Comparison with KNN through Standardization and Normalization Techniques,” Advance Sustainable Science, Engineering and Technology, vol. 6, no. 1, p. 02401012, Jan. 2024, doi: https://doi.org/10.26877/asset.v6i1.17685.

[37] Vera, J. Aranda, Ricardo, R. Valle, and J. Luis, “Imbalanced data preprocessing techniques for : a systematic mapping study,” vol. 65, no. 1, pp. 31–57, Nov. 2022, doi: https://doi.org/10.1007/s10115-022-01772-8.

[38] Y. Yang, Hadi Akbarzadeh Khorshidi, and Uwe Aickelin, “A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems,” Frontiers in Digital Health, vol. 6, Jul. 2024, doi: https://doi.org/10.3389/fdgth.2024.1430245.

[39] D. Elreedy, A. F. Atiya, and F. Kamalov, “A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning,” , vol. 113, Jan. 2023, doi: https://doi.org/10.1007/s10994-022-06296-4.

[40] M. Salmi, D. Atif, D. Oliva, A. Abraham, and S. Ventura, “Handling imbalanced medical datasets: review of a decade of research,” Artificial Intelligence Review, vol. 57, no. 10, Sep. 2024, doi: https://doi.org/10.1007/s10462-024-10884-2.

[41] C. G. Tekkali and K. Natarajan, “An advancement in AdaSyn for imbalanced learning: An application to fraud detection in digital transactions,” Journal of Intelligent & Fuzzy Systems, vol. 46, no. 5–6, pp. 11381–11396, Oct. 2024, doi: https://doi.org/10.3233/jifs-236392.

[42] Z. Maisat Eka Darmawan and A. Fauzan Dianta, “Implementasi Optimasi Hyperparameter GridSearchCV Pada Sistem Prediksi Serangan Jantung Menggunakan SVM,” Teknologi, vol. 13, no. 1, pp. 8–15, Jan. 2023, doi: https://doi.org/10.26594/teknologi.v13i1.3098.

[43] S. Mulyani and T. Arifin, “Prediksi Kelangsungan Hidup Pasien Gagal Jantung Menggunakan Pendekatan dengan Optimasi GridSearchCV,” Jurnal Informatika Polinema, vol. 11, no. 4, pp. 577–586, Aug. 2025, doi: https://doi.org/10.33795/jip.v11i4.7938.

[44] Masdar Desiawan and Achmad Solichin, “SVM Optimization with Grid Search Cross Validation for Improving Accuracy of Schizophrenia Classification Based on EEG Signal,” JURNAL TEKNIK INFORMATIKA, vol. 17, no. 1, pp. 10–20, May 2024, doi: https://doi.org/10.15408/jti.v17i1.37422.

[45] Y. Xi, H. Wang, and N. Sun, “Machine learning outperforms traditional logistic regression and offers new possibilities for cardiovascular risk prediction: A study involving 143,043 Chinese patients with hypertension,” Frontiers in Cardiovascular Medicine, vol. 9, Nov. 2022, doi: https://doi.org/10.3389/fcvm.2022.1025705.

[46] N. Nuryani, T. Utomo, N. Wiyono, A. D. Sutomo, and S. Ling, “Cuffless Hypertension Detection using Swarm Support Vector Machine Utilizing Photoplethysmogram and Electrocardiogram,” Journal of Biomedical Physics and Engineering, vol. 13, no. 5, 2023, doi: https://doi.org/10.31661/jbpe.v0i0.2206-1504.

[47] R. Rizal Isnanto, I. Rashad, and Catur Edi Widodo, “Classification of Heart Disease Using Linear Discriminant Analysis Algorithm,” E3S Web of Conferences, vol. 448, pp. 02053–02053, Jan. 2023, doi: https://doi.org/10.1051/e3sconf/202344802053.

[48] M. M. Bukhari, B. F. Alkhamees, S. Hussain, A. Gumaei, A. Assiri, and S. S. Ullah, “An Improved Artificial Neural Network Model for Effective Diabetes Prediction,” Complexity, vol. 2021, pp. 1–10, Apr. 2021, doi: https://doi.org/10.1155/2021/5525271.

[49] E. Martinez-Ríos, L. Montesinos, M. Alfaro-Ponce, and L. Pecchia, “A review of in hypertension detection and blood pressure estimation based on clinical and physiological data,” Biomedical Signal Processing and Control, vol. 68, p. 102813, Jul. 2021, doi: https://doi.org/10.1016/j.bspc.2021.102813.

[50] F. López-Martínez, E. R. Núñez-Valdez, R. G. Crespo, and V. García-Díaz, “An artificial neural network approach for predicting hypertension using NHANES data,” Scientific Reports, vol. 10, no. 1, Jun. 2020, doi: https://doi.org/10.1038/s41598-020-67640-z.

Downloads

Published

2026-02-09

How to Cite

[1]
A. H. Maulia and A. Salam, “Benchmarking Oversampling Strategies to Enhance the Performance of Machine Learning Algorithms in Hypertension Classification”, JAIC, vol. 10, no. 1, pp. 747–761, Feb. 2026.

Most read articles by the same author(s)

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.