Analysis of SMOTE and Random Search on Machine Learning Algorithms for Stroke Disease Diagnosis

Authors

  • Ubaid Khoir Julio Dn Universitas Amikom Yogyakarta
  • Majid Rahardi Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.30871/jaic.v10i1.12046

Keywords:

CatBoost, Machine Learning, Random Search, SMOTE, Stroke Prediction

Abstract

Stroke is a critical medical condition in which false negative predictions may lead to delayed treatment and increased mortality. Therefore, predictive models in the medical domain should prioritize sensitivity (recall) in addition to overall accuracy. This study analyzes the impact of the Synthetic Minority Over-sampling Technique (SMOTE) and Random Search hyperparameter optimization on five machine learning algorithms—Random Forest, XGBoost, Support Vector Machine (SVM), Logistic Regression, and CatBoost—for stroke disease diagnosis. Two experimental scenarios were conducted, namely models trained without SMOTE and models trained with SMOTE applied only to the training data to prevent data leakage. Model performance was evaluated using accuracy, precision, recall, and F1-score, with particular emphasis on recall due to its clinical relevance. In clinical practice, low recall may lead to false negative predictions, where high-risk stroke patients are not identified by the system, potentially resulting in delayed medical intervention. Therefore, recall is emphasized as the primary performance metric in this study. Experimental results demonstrate that SMOTE consistently improves recall across all models, while Random Search further enhances performance. CatBoost achieved the best performance with an accuracy of 96.61%, recall of 97%, and F1-score of 97%. Despite its superior performance, potential overfitting risks are critically discussed. These findings indicate that the proposed approach produces a clinically relevant decision-support model for stroke risk prediction.

Downloads

Download data is not yet available.

References

[1] D. Kuriakose, “Pathophysiology and Treatment of Stroke : Present Status and Future Perspectives,” 2020.

[2] A. Byna and M. Basit, “Penerapan Metode Adaboost Untuk Mengoptimasi Prediksi Penyakit Stroke Dengan Algoritma Naïve Bayes,” vol. 09, no. November, pp. 407–411, 2020.

[3] K. Gorontalo, “Medic nutricia 2025,” vol. 13, no. 4, pp. 25–31, 2025, doi: 10.5455/mnj.v1i2.644xa.

[4] O. Acces, “Open Acces,” vol. 03, no. 01, pp. 1660–1665, 2021.

[5] M. M. Jakarta, “Penerapan Algoritma K-Nearest Neighbor ( KNN ) untuk Memprediksi Stroke pada Rumah Sakit Pusat Otak Nasional Prof .,” vol. 26, no. 1, pp. 144–153.

[6] M. Putri, “Prediksi Penyakit Stroke Menggunakan Machine Learning Dengan Algoritma Random Forest,” vol. 9, no. 2, 2024.

[7] K. Sari et al., “Deteksi Dini Stroke Menggunakan Machine Learning,” vol. 4, no. 4, pp. 706–720, 2025, doi: 10.55123/insologi.v4i4.5590.

[8] B. Nemade, V. Bharadi, S. S. Alegavi, and B. Marakarkandy, “Intelligent Systems And Applications In A Comprehensive Review : SMOTE-Based Oversampling Methods for Imbalanced Classification Techniques , Evaluation , and Result Comparisons,” 2023.

[9] M. H. Rizky, M. R. Faisal, I. Budiman, and D. Kartini, “Effect of Hyperparameter Tuning Using Random Search on Tree-Based Classification Algorithm for Software Defect Prediction,” vol. 18, no. 1, pp. 95–106, 2024.

[10] A. Hassan, S. G. Ahmad, and N. Ramzan, “Predictive modelling and identification of key risk factors for stroke using machine learning,” Sci. Rep., no. 0123456789, pp. 1–23, 2024, doi: 10.1038/s41598-024-61665-4.

[11] X. Yuan, S. Liu, W. Feng, and G. Dauphin, “Feature Importance Ranking of Random Forest-Based End-to-End Learning Algorithm,” pp. 1–20, 2023.

[12] M. R. Kurniawanda, F. Adline, T. Tobing, and P. S. Informatika, “Analysis Sentiment Cyberbullying in Instagram Comments with XGBoost Method,” vol. 9, no. 1, 2022.

[13] A. A. Putri et al., “Penerapan Metode Logistic Regression Untuk,” vol. 7, no. 1, pp. 95–107.

[14] V. H. Vidiasari, H. Hairani, H. Santoso, and F. M. Amin, “Penerapan Logistic Regression dan SMOTE untuk Memprediksi Atrisi Karyawan pada Imbalanced Data,” no. September, pp. 7–14, 2025.

[15] D. R. Nurqotimah, A. N. Khudori, and R. S. Pradini, “Journal Of Applied Computer Science And Technology ( JACOST ) Implementasi Algoritma Support Vector Machine ( SVM ) Untuk Klasifikasi Penyakit Stroke,” vol. 5, no. 2, pp. 179–185, 2024.

[16] M. A. Ramadhani et al., “Implementasi Algoritma Support Vector Machine ( SVM ) Untuk Diagnosis Kesehatan Manusia Berbasis Web,” vol. 9, pp. 896–902, 2025.

[17] A. B. Mawardi, R. S. Pradini, M. S. Haris, and G. Boosting, “Boosting,” vol. 13, no. 3.

[18] A. Informatics and A. Info, “Perbandingan Performa Algoritma XGBoost , CatBoost Dan,” vol. 8, no. 1, pp. 268–273, 2025.

[19] M. Sholeh, U. Lestari, and D. Andayati, “Hyperparameter Optimization Using Grid Search and Random Search to Improve the Performance of Prediction Models with Decision Trees,” vol. 3, no. 03, pp. 453–464, 2025.

[20] S. Fitria, A. Khansa, N. Ulinnuha, and W. D. Utami, “Grid Search And Random Search Hyperparameter Tuning Optimization In Xgboost Algorithm For Parkinson ’ S Disease Classification,” vol. 19, no. 3, pp. 1609–1624, 2025.

[21] J. Multidisiplin and D. Sains, “Randomsearchcv Untuk Meningkatkan Akurasi,” vol. 1, no. 2, pp. 121–135, 2025.

[22] F. Adha, H. Airi, T. Suprapti, and A. Bahtiar, “Komparasi Metode Klasifikasi Data Mining Untuk Prediksi,” vol. 18, pp. 73–79, 2023.

[23] S. Rahayu and S. F. Romdoni, “Bayesian Optimized Pretrained CNNs for Mango Leaf Disease Classification : A Comparative Study,” vol. 6, no. 5, pp. 3051–3078, 2025.

[24] N. Abay, D. Istanto, B. Satrio, W. Poetro, U. Islam, and S. Agung, “Implementasi Algoritma Faster R-Cnn Dalam Deteksi,” vol. 3, no. 1, pp. 43–59, 2025.

[25] S. Helmiyah, R. Pramestiawan, and R. Lampung, “Analisis Komparatif Algoritma Machine Learning dengan Metrik Akurasi , Presisi , Recall , dan F1-Score pada Dataset Kacang Kering,” vol. 6, no. 3, pp. 152–159, 2025.

[26] J. T. Hancock, T. M. Khoshgoftaar, and J. M. Johnson, “Evaluating classifier performance with highly imbalanced Big Data,” J. Big Data, 2023, doi: 10.1186/s40537-023-00724-5.

[27] M. Sutcu, D. Jouda, B. Yildiz, and J. Katrib, “Predicting Stroke Risk Using Machine Learning : A Data-Driven Approach to Early Detection and Prevention,” vol. 2025, 2025.

[28] F. Natasha, B. Zahari, and K. Ramakrishnan, “Machine Learning-Driven Stroke Prediction Using Independent Dataset,” vol. 8, no. May, 2024.

[29] P. O. Akinwumi et al., “Evaluating machine learning models for stroke prediction based on clinical variables,” no. September, 2025, doi: 10.3389/fneur.2025.1668420.

[30] G. Samudra et al., “Efektivitas Teknik SMOTE Dalam Meningkatkan Performa Naïve Bayes Deteksi Gangguan Kecemasan Mahasiswa,” vol. 12, no. 3, 2025.

[31] A. D. Rachmatsyah et al., “Perbandingan Teknik Optimasi Grid Search dan Randomized Search dalam Meningkatkan Akurasi Metode Klasifikasi SVM Pada Sentimen Ulasan Pengguna Aplikasi JKN Mobile,” vol. 8, pp. 13–22, 2025.

Downloads

Published

2026-02-09

How to Cite

[1]
U. K. J. Dn and M. Rahardi, “Analysis of SMOTE and Random Search on Machine Learning Algorithms for Stroke Disease Diagnosis”, JAIC, vol. 10, no. 1, pp. 847–855, Feb. 2026.

Most read articles by the same author(s)

1 2 3 4 > >> 

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.