Analysis of SMOTE and Random Search on Machine Learning Algorithms for Stroke Disease Diagnosis
DOI:
https://doi.org/10.30871/jaic.v10i1.12046Keywords:
CatBoost, Machine Learning, Random Search, SMOTE, Stroke PredictionAbstract
Stroke is a critical medical condition in which false negative predictions may lead to delayed treatment and increased mortality. Therefore, predictive models in the medical domain should prioritize sensitivity (recall) in addition to overall accuracy. This study analyzes the impact of the Synthetic Minority Over-sampling Technique (SMOTE) and Random Search hyperparameter optimization on five machine learning algorithms—Random Forest, XGBoost, Support Vector Machine (SVM), Logistic Regression, and CatBoost—for stroke disease diagnosis. Two experimental scenarios were conducted, namely models trained without SMOTE and models trained with SMOTE applied only to the training data to prevent data leakage. Model performance was evaluated using accuracy, precision, recall, and F1-score, with particular emphasis on recall due to its clinical relevance. In clinical practice, low recall may lead to false negative predictions, where high-risk stroke patients are not identified by the system, potentially resulting in delayed medical intervention. Therefore, recall is emphasized as the primary performance metric in this study. Experimental results demonstrate that SMOTE consistently improves recall across all models, while Random Search further enhances performance. CatBoost achieved the best performance with an accuracy of 96.61%, recall of 97%, and F1-score of 97%. Despite its superior performance, potential overfitting risks are critically discussed. These findings indicate that the proposed approach produces a clinically relevant decision-support model for stroke risk prediction.
Downloads
References
[1] D. Kuriakose, “Pathophysiology and Treatment of Stroke : Present Status and Future Perspectives,” 2020.
[2] A. Byna and M. Basit, “Penerapan Metode Adaboost Untuk Mengoptimasi Prediksi Penyakit Stroke Dengan Algoritma Naïve Bayes,” vol. 09, no. November, pp. 407–411, 2020.
[3] K. Gorontalo, “Medic nutricia 2025,” vol. 13, no. 4, pp. 25–31, 2025, doi: 10.5455/mnj.v1i2.644xa.
[4] O. Acces, “Open Acces,” vol. 03, no. 01, pp. 1660–1665, 2021.
[5] M. M. Jakarta, “Penerapan Algoritma K-Nearest Neighbor ( KNN ) untuk Memprediksi Stroke pada Rumah Sakit Pusat Otak Nasional Prof .,” vol. 26, no. 1, pp. 144–153.
[6] M. Putri, “Prediksi Penyakit Stroke Menggunakan Machine Learning Dengan Algoritma Random Forest,” vol. 9, no. 2, 2024.
[7] K. Sari et al., “Deteksi Dini Stroke Menggunakan Machine Learning,” vol. 4, no. 4, pp. 706–720, 2025, doi: 10.55123/insologi.v4i4.5590.
[8] B. Nemade, V. Bharadi, S. S. Alegavi, and B. Marakarkandy, “Intelligent Systems And Applications In A Comprehensive Review : SMOTE-Based Oversampling Methods for Imbalanced Classification Techniques , Evaluation , and Result Comparisons,” 2023.
[9] M. H. Rizky, M. R. Faisal, I. Budiman, and D. Kartini, “Effect of Hyperparameter Tuning Using Random Search on Tree-Based Classification Algorithm for Software Defect Prediction,” vol. 18, no. 1, pp. 95–106, 2024.
[10] A. Hassan, S. G. Ahmad, and N. Ramzan, “Predictive modelling and identification of key risk factors for stroke using machine learning,” Sci. Rep., no. 0123456789, pp. 1–23, 2024, doi: 10.1038/s41598-024-61665-4.
[11] X. Yuan, S. Liu, W. Feng, and G. Dauphin, “Feature Importance Ranking of Random Forest-Based End-to-End Learning Algorithm,” pp. 1–20, 2023.
[12] M. R. Kurniawanda, F. Adline, T. Tobing, and P. S. Informatika, “Analysis Sentiment Cyberbullying in Instagram Comments with XGBoost Method,” vol. 9, no. 1, 2022.
[13] A. A. Putri et al., “Penerapan Metode Logistic Regression Untuk,” vol. 7, no. 1, pp. 95–107.
[14] V. H. Vidiasari, H. Hairani, H. Santoso, and F. M. Amin, “Penerapan Logistic Regression dan SMOTE untuk Memprediksi Atrisi Karyawan pada Imbalanced Data,” no. September, pp. 7–14, 2025.
[15] D. R. Nurqotimah, A. N. Khudori, and R. S. Pradini, “Journal Of Applied Computer Science And Technology ( JACOST ) Implementasi Algoritma Support Vector Machine ( SVM ) Untuk Klasifikasi Penyakit Stroke,” vol. 5, no. 2, pp. 179–185, 2024.
[16] M. A. Ramadhani et al., “Implementasi Algoritma Support Vector Machine ( SVM ) Untuk Diagnosis Kesehatan Manusia Berbasis Web,” vol. 9, pp. 896–902, 2025.
[17] A. B. Mawardi, R. S. Pradini, M. S. Haris, and G. Boosting, “Boosting,” vol. 13, no. 3.
[18] A. Informatics and A. Info, “Perbandingan Performa Algoritma XGBoost , CatBoost Dan,” vol. 8, no. 1, pp. 268–273, 2025.
[19] M. Sholeh, U. Lestari, and D. Andayati, “Hyperparameter Optimization Using Grid Search and Random Search to Improve the Performance of Prediction Models with Decision Trees,” vol. 3, no. 03, pp. 453–464, 2025.
[20] S. Fitria, A. Khansa, N. Ulinnuha, and W. D. Utami, “Grid Search And Random Search Hyperparameter Tuning Optimization In Xgboost Algorithm For Parkinson ’ S Disease Classification,” vol. 19, no. 3, pp. 1609–1624, 2025.
[21] J. Multidisiplin and D. Sains, “Randomsearchcv Untuk Meningkatkan Akurasi,” vol. 1, no. 2, pp. 121–135, 2025.
[22] F. Adha, H. Airi, T. Suprapti, and A. Bahtiar, “Komparasi Metode Klasifikasi Data Mining Untuk Prediksi,” vol. 18, pp. 73–79, 2023.
[23] S. Rahayu and S. F. Romdoni, “Bayesian Optimized Pretrained CNNs for Mango Leaf Disease Classification : A Comparative Study,” vol. 6, no. 5, pp. 3051–3078, 2025.
[24] N. Abay, D. Istanto, B. Satrio, W. Poetro, U. Islam, and S. Agung, “Implementasi Algoritma Faster R-Cnn Dalam Deteksi,” vol. 3, no. 1, pp. 43–59, 2025.
[25] S. Helmiyah, R. Pramestiawan, and R. Lampung, “Analisis Komparatif Algoritma Machine Learning dengan Metrik Akurasi , Presisi , Recall , dan F1-Score pada Dataset Kacang Kering,” vol. 6, no. 3, pp. 152–159, 2025.
[26] J. T. Hancock, T. M. Khoshgoftaar, and J. M. Johnson, “Evaluating classifier performance with highly imbalanced Big Data,” J. Big Data, 2023, doi: 10.1186/s40537-023-00724-5.
[27] M. Sutcu, D. Jouda, B. Yildiz, and J. Katrib, “Predicting Stroke Risk Using Machine Learning : A Data-Driven Approach to Early Detection and Prevention,” vol. 2025, 2025.
[28] F. Natasha, B. Zahari, and K. Ramakrishnan, “Machine Learning-Driven Stroke Prediction Using Independent Dataset,” vol. 8, no. May, 2024.
[29] P. O. Akinwumi et al., “Evaluating machine learning models for stroke prediction based on clinical variables,” no. September, 2025, doi: 10.3389/fneur.2025.1668420.
[30] G. Samudra et al., “Efektivitas Teknik SMOTE Dalam Meningkatkan Performa Naïve Bayes Deteksi Gangguan Kecemasan Mahasiswa,” vol. 12, no. 3, 2025.
[31] A. D. Rachmatsyah et al., “Perbandingan Teknik Optimasi Grid Search dan Randomized Search dalam Meningkatkan Akurasi Metode Klasifikasi SVM Pada Sentimen Ulasan Pengguna Aplikasi JKN Mobile,” vol. 8, pp. 13–22, 2025.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Ubaid Khoir Julio Dn, Majid Rahardi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








