Hypertension Risk Prediction Using Stacking Ensemble of CatBoost, XGBoost, and LightGBM: A Machine Learning Approach

Authors

  • Abisakha Saif Alfath Universiitas Amikom Yogyakarta
  • Ajie Kusuma Wardhana Universitas Amikom Yogyakarta
  • Rumini Rumini Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.30871/jaic.v9i6.10370

Keywords:

Hypertension Prediction, Stacking Ensemble, Machine Learning, Imbalanced Data Handling, Classification Metrics

Abstract

Hypertension is a leading cause of cardiovascular diseases, chronic kidney failure, and strokes, affecting millions worldwide. Early detection and accurate risk prediction are crucial for effective management and prevention. This study aims to evaluate and compare the performance of different algorithms for predicting hypertension risk using a stacking ensemble approach. The model combines three gradient boosting algorithms XGBoost, LightGBM, and CatBoost as base learners, with Logistic Regression as the meta learner. The dataset, sourced from Kaggle, contains 4,240 instances with demographic and clinical attributes relevant to hypertension. The preprocessing steps included imputing missing values using the median, removing residual null entries, and addressing class imbalance through the SMOTE algorithm. Data were divided into 80% for training and 20% for testing. The evaluation showed that the stacking ensemble model achieved an overall accuracy of 92,65%, with precision, recall, and F1-scores consistently reaching 0.92 for both classes. The confusion matrix revealed minimal misclassification, indicating the model’s strong ability to differentiate between low and high risk individuals. These results emphasize that the primary goal of this research is to identify which algorithm provides the best performance for hypertension risk prediction. By evaluating and comparing different models, this study offers insights into choosing the most effective algorithm for clinical decision-making and early detection strategies.

Downloads

Download data is not yet available.

References

[1] H. Zhao et al., “Predicting the Risk of Hypertension Based on Several Easy-to-Collect Risk Factors: A Machine Learning Method,” Front Public Health, vol. 9, Sep. 2021, doi: 10.3389/fpubh.2021.619429.

[2] G. Iaccarino, G. Santulli, H. Tian, Y. Wang, and Y. Zhou, “Development and validation of prediction models for hypertension risks: A cross-sectional study based on n,” Sep. 2022. doi: 10.3389/fcvm.2022.928948.

[3] World Health Organization, “Hypertension,” Mar. 2023.

[4] Kementerian Kesehatan Republik Indonesia, “Hypertension is called a silent killer, Minister of Health Budi urges routine blood pressure checks,” Jun. 2023.

[5] S. M. S. Islam et al., “Machine Learning Approaches for Predicting Hypertension and Its Associated Factors Using Population-Level Data From Three South Asian Countries,” Front Cardiovasc Med, vol. 9, Mar. 2022, doi: 10.3389/fcvm.2022.839379.

[6] S. Montagna et al., “Machine Learning in Hypertension Detection: A Study on World Hypertension Day Data,” J Med Syst, vol. 47, no. 1, Dec. 2023, doi: 10.1007/s10916-022-01900-5.

[7] M. Z. I. Chowdhury et al., “Prediction of hypertension using traditional regression and machine learning models: A systematic review and meta-analysis,” PLoS One, vol. 17, no. 4 April, Apr. 2022, doi: 10.1371/journal.pone.0266334.

[8] M. M. Islam et al., “Predicting the risk of hypertension using machine learning algorithms: A cross sectional study in Ethiopia,” PLoS One, vol. 18, no. 8 August, Aug. 2023, doi: 10.1371/journal.pone.0289613.

[9] P. Purwono et al., “Model Prediksi Otomatis Jenis Penyakit Hipertensi dengan Pemanfaatan Algoritma Machine Learning Artificial Neural Network,” INSECT, vol. 7, no. 2, p. p, 2022.

[10] S. Reel et al., “Predicting Hypertension Subtypes with Machine Learning Using Targeted Metabolites and Their Ratios,” Metabolites, vol. 12, no. 8, Aug. 2022, doi: 10.3390/metabo12080755.

[11] A. N. Haya and M. Y. Ramme, “Penerapan Algoritma Stacking Ensemble Machine Learning Berbasis Pohon untuk Prediksi Penyakit Diabetes,” Seminar Nasional Sains Data, vol. 2024, 2024.

[12] A. Y. Yıldız and A. Kalayci, “Gradient Boosting Decision Trees on Medical Diagnosis over Tabular Data,” Aug. 2025, doi: 10.1109/ICAD65464.2025.11114069.

[13] D. Boldini, F. Grisoni, D. Kuhn, L. Friedrich, and S. A. Sieber, “Practical guidelines for the use of gradient boosting for molecular property prediction,” J Cheminform, vol. 15, no. 1, Dec. 2023, doi: 10.1186/s13321-023-00743-7.

[14] Raihan Khan, “Hypertension-risk-model-main,” Kagle.

[15] M. S. Tackney, D. Stahl, E. Williamson, and J. Carpenter, “Missing Step Count Data? Step Away From the Expectation–Maximization Algorithm,” J Meas Phys Behav, vol. 5, no. 4, pp. 205–214, Dec. 2022, doi: 10.1123/jmpb.2022-0002.

[16] S. Boughorbel, F. Jarray, and M. El-Anbari, “Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric,” PLoS One, vol. 12, no. 6, Jun. 2017, doi: 10.1371/journal.pone.0177678.

[17] I. K. Sifat and M. K. Kibria, “Optimizing hypertension prediction using ensemble learning approaches,” PLoS One, vol. 19, no. 12, Dec. 2024, doi: 10.1371/journal.pone.0315865.

[18] A. N. Haya and M. Y. Ramme, “Penerapan Algoritma Stacking Ensemble Machine Learning Berbasis Pohon untuk Prediksi Penyakit Diabetes,” Seminar Nasional Sains Data, vol. 2024.

[19] P. Mahajan, S. Uddin, F. Hajati, and M. A. Moni, “Ensemble Learning for Disease Prediction: A Review,” Jun. 01, 2023, MDPI. doi: 10.3390/healthcare11121808.

[20] I. D. Mienye and Y. Sun, “A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects,” 2022, Institute of Electrical and Electronics Engineers Inc. doi: 10.1109/ACCESS.2022.3207287.

[21] I. Markoulidakis and G. Markoulidakis, “Probabilistic Confusion Matrix: A Novel Method for Machine Learning Algorithm Generalized Performance Analysis,” Technologies (Basel), vol. 12, no. 7, Jul. 2024, doi: 10.3390/technologies12070113.

[22] O. Rainio, J. Teuho, and R. Klén, “Evaluation metrics and statistical tests for machine learning,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-56706-x.

[23] S. Sathyanarayanan, “Confusion Matrix-Based Performance Evaluation Metrics,” African Journal of Biomedical Research, pp. 4023–4031, Nov. 2024, doi: 10.53555/ajbr.v27i4s.4345.

[24] P. Purwono et al., “Model Prediksi Otomatis Jenis Penyakit Hipertensi dengan Pemanfaatan Algoritma Machine Learning Artificial Neural Network,” vol. 7, no. 2, p. p, 2022.

Downloads

Published

2025-12-05

How to Cite

[1]
A. S. Alfath, A. K. Wardhana, and R. Rumini, “Hypertension Risk Prediction Using Stacking Ensemble of CatBoost, XGBoost, and LightGBM: A Machine Learning Approach”, JAIC, vol. 9, no. 6, pp. 3146–3156, Dec. 2025.

Most read articles by the same author(s)

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.