Hypertension Risk Prediction Using Stacking Ensemble of CatBoost, XGBoost, and LightGBM: A Machine Learning Approach
DOI:
https://doi.org/10.30871/jaic.v9i6.10370Keywords:
Hypertension Prediction, Stacking Ensemble, Machine Learning, Imbalanced Data Handling, Classification MetricsAbstract
Hypertension is a leading cause of cardiovascular diseases, chronic kidney failure, and strokes, affecting millions worldwide. Early detection and accurate risk prediction are crucial for effective management and prevention. This study aims to evaluate and compare the performance of different algorithms for predicting hypertension risk using a stacking ensemble approach. The model combines three gradient boosting algorithms XGBoost, LightGBM, and CatBoost as base learners, with Logistic Regression as the meta learner. The dataset, sourced from Kaggle, contains 4,240 instances with demographic and clinical attributes relevant to hypertension. The preprocessing steps included imputing missing values using the median, removing residual null entries, and addressing class imbalance through the SMOTE algorithm. Data were divided into 80% for training and 20% for testing. The evaluation showed that the stacking ensemble model achieved an overall accuracy of 92,65%, with precision, recall, and F1-scores consistently reaching 0.92 for both classes. The confusion matrix revealed minimal misclassification, indicating the model’s strong ability to differentiate between low and high risk individuals. These results emphasize that the primary goal of this research is to identify which algorithm provides the best performance for hypertension risk prediction. By evaluating and comparing different models, this study offers insights into choosing the most effective algorithm for clinical decision-making and early detection strategies.
Downloads
References
[1] H. Zhao et al., “Predicting the Risk of Hypertension Based on Several Easy-to-Collect Risk Factors: A Machine Learning Method,” Front Public Health, vol. 9, Sep. 2021, doi: 10.3389/fpubh.2021.619429.
[2] G. Iaccarino, G. Santulli, H. Tian, Y. Wang, and Y. Zhou, “Development and validation of prediction models for hypertension risks: A cross-sectional study based on n,” Sep. 2022. doi: 10.3389/fcvm.2022.928948.
[3] World Health Organization, “Hypertension,” Mar. 2023.
[4] Kementerian Kesehatan Republik Indonesia, “Hypertension is called a silent killer, Minister of Health Budi urges routine blood pressure checks,” Jun. 2023.
[5] S. M. S. Islam et al., “Machine Learning Approaches for Predicting Hypertension and Its Associated Factors Using Population-Level Data From Three South Asian Countries,” Front Cardiovasc Med, vol. 9, Mar. 2022, doi: 10.3389/fcvm.2022.839379.
[6] S. Montagna et al., “Machine Learning in Hypertension Detection: A Study on World Hypertension Day Data,” J Med Syst, vol. 47, no. 1, Dec. 2023, doi: 10.1007/s10916-022-01900-5.
[7] M. Z. I. Chowdhury et al., “Prediction of hypertension using traditional regression and machine learning models: A systematic review and meta-analysis,” PLoS One, vol. 17, no. 4 April, Apr. 2022, doi: 10.1371/journal.pone.0266334.
[8] M. M. Islam et al., “Predicting the risk of hypertension using machine learning algorithms: A cross sectional study in Ethiopia,” PLoS One, vol. 18, no. 8 August, Aug. 2023, doi: 10.1371/journal.pone.0289613.
[9] P. Purwono et al., “Model Prediksi Otomatis Jenis Penyakit Hipertensi dengan Pemanfaatan Algoritma Machine Learning Artificial Neural Network,” INSECT, vol. 7, no. 2, p. p, 2022.
[10] S. Reel et al., “Predicting Hypertension Subtypes with Machine Learning Using Targeted Metabolites and Their Ratios,” Metabolites, vol. 12, no. 8, Aug. 2022, doi: 10.3390/metabo12080755.
[11] A. N. Haya and M. Y. Ramme, “Penerapan Algoritma Stacking Ensemble Machine Learning Berbasis Pohon untuk Prediksi Penyakit Diabetes,” Seminar Nasional Sains Data, vol. 2024, 2024.
[12] A. Y. Yıldız and A. Kalayci, “Gradient Boosting Decision Trees on Medical Diagnosis over Tabular Data,” Aug. 2025, doi: 10.1109/ICAD65464.2025.11114069.
[13] D. Boldini, F. Grisoni, D. Kuhn, L. Friedrich, and S. A. Sieber, “Practical guidelines for the use of gradient boosting for molecular property prediction,” J Cheminform, vol. 15, no. 1, Dec. 2023, doi: 10.1186/s13321-023-00743-7.
[14] Raihan Khan, “Hypertension-risk-model-main,” Kagle.
[15] M. S. Tackney, D. Stahl, E. Williamson, and J. Carpenter, “Missing Step Count Data? Step Away From the Expectation–Maximization Algorithm,” J Meas Phys Behav, vol. 5, no. 4, pp. 205–214, Dec. 2022, doi: 10.1123/jmpb.2022-0002.
[16] S. Boughorbel, F. Jarray, and M. El-Anbari, “Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric,” PLoS One, vol. 12, no. 6, Jun. 2017, doi: 10.1371/journal.pone.0177678.
[17] I. K. Sifat and M. K. Kibria, “Optimizing hypertension prediction using ensemble learning approaches,” PLoS One, vol. 19, no. 12, Dec. 2024, doi: 10.1371/journal.pone.0315865.
[18] A. N. Haya and M. Y. Ramme, “Penerapan Algoritma Stacking Ensemble Machine Learning Berbasis Pohon untuk Prediksi Penyakit Diabetes,” Seminar Nasional Sains Data, vol. 2024.
[19] P. Mahajan, S. Uddin, F. Hajati, and M. A. Moni, “Ensemble Learning for Disease Prediction: A Review,” Jun. 01, 2023, MDPI. doi: 10.3390/healthcare11121808.
[20] I. D. Mienye and Y. Sun, “A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects,” 2022, Institute of Electrical and Electronics Engineers Inc. doi: 10.1109/ACCESS.2022.3207287.
[21] I. Markoulidakis and G. Markoulidakis, “Probabilistic Confusion Matrix: A Novel Method for Machine Learning Algorithm Generalized Performance Analysis,” Technologies (Basel), vol. 12, no. 7, Jul. 2024, doi: 10.3390/technologies12070113.
[22] O. Rainio, J. Teuho, and R. Klén, “Evaluation metrics and statistical tests for machine learning,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-56706-x.
[23] S. Sathyanarayanan, “Confusion Matrix-Based Performance Evaluation Metrics,” African Journal of Biomedical Research, pp. 4023–4031, Nov. 2024, doi: 10.53555/ajbr.v27i4s.4345.
[24] P. Purwono et al., “Model Prediksi Otomatis Jenis Penyakit Hipertensi dengan Pemanfaatan Algoritma Machine Learning Artificial Neural Network,” vol. 7, no. 2, p. p, 2022.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Abisakha Saif Alfath, Ajie Kusuma Wardhana, Rumini Rumini

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








