Calibration and Applied Statistical Modeling Using Logistic Regression on the UCI Heart Disease Dataset

Authors

  • Andi Cahyono Informatika Medis, Universitas Sains dan Teknologi Indonesia
  • Inkha Ameriza Pendidikan Teknologi Informasi, Universitas Sains dan Teknologi Indonesia
  • Gunadi Gunadi Teknik Informatika, Universitas Sains dan Teknologi Indonesia
  • Ervira Dwiaprini As Syifa Informatika Medis, Universitas Sains dan Teknologi Indonesia
  • Mashal Kasem Alqudah Faculty of Computer Information Science, Higher Colleges of Technology, Sharjah, United Arab Emirates

DOI:

https://doi.org/10.30871/jaic.v10i1.11853

Keywords:

Brier Score, Isotonic Regression, Logistic Regression, Platt Scalling, UCI Dataset

Abstract

Accurate and well-calibrated heart disease risk prediction is essential for supporting medical decision-making. This study analyzes Logistic Regression as an applied statistical model for heart disease prediction using the UCI Heart Disease dataset. Beyond discrimination metrics, we explicitly focus on probability reliability by evaluating calibration through the Brier score, calibration slope, and intercept, and by quantifying the impact of post-hoc calibration (isotonic regression and Platt scaling) on both calibration and discrimination. Model validation was conducted using stratified 5-fold cross-validation with AUROC, AUPRC, accuracy, and F1-score as evaluation metrics. The results show that Logistic Regression achieved competitive performance (AUROC 0.903; AUPRC 0.911; Accuracy 0.822; F1-score 0.835) with well-calibrated probability estimates relative to Random Forest and Gradient Boosting under the evaluated setting. Feature importance analysis using permutation methods identified chest pain type, number of major vessels (ca), ST depression (oldpeak), and exercise-induced angina (exang) as key predictors consistent with clinical literature. These findings indicate that simple applied statistical modeling, when paired with rigorous calibration assessment, can provide interpretable risk estimates that are more suitable for threshold-based decision support in early heart disease screening.

Downloads

Download data is not yet available.

References

[1] W. Adisasmito, V. Amir, A. Atin, A. Megraini, and D. Kusuma, “Geographic and socioeconomic disparity in cardiovascular risk factors in Indonesia: analysis of the Basic Health Research 2018,” BMC Public Health, vol. 20, no. 1, p. 1004, Jun. 2020, doi: 10.1186/s12889-020-09099-1.

[2] S. Sujarwoto et al., “Healthcare access and socio-demographic determinants of estimated 10-year risk of cardiovascular diseases in Indonesia: A population-based study,” PLOS ONE, vol. 20, no. 8, p. e0318112, Aug. 2025, doi: 10.1371/journal.pone.0318112.

[3] “Cardiovascular diseases (CVDs).” Accessed: Sep. 01, 2025. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)

[4] D. S. Arsyad et al., “Modifiable risk factors in adults with and without prior cardiovascular disease: findings from the Indonesian National Basic Health Research,” BMC Public Health, vol. 22, no. 1, p. 660, Apr. 2022, doi: 10.1186/s12889-022-13104-0.

[5] J. Singh Thakur, R. Nangia, and S. Singh, “Progress and challenges in achieving noncommunicable diseases targets for the sustainable development goals,” FASEB BioAdvances, vol. 3, no. 8, pp. 563–568, 2021, doi: 10.1096/fba.2020-00117.

[6] R. Nugent et al., “Investing in non-communicable disease prevention and management to advance the Sustainable Development Goals,” The Lancet, vol. 391, no. 10134, pp. 2029–2035, May 2018, doi: 10.1016/S0140-6736(18)30667-6.

[7] S. P. Karunathilake and G. U. Ganegoda, “Secondary Prevention of Cardiovascular Diseases and Application of Technology for Early Diagnosis,” BioMed Res. Int., vol. 2018, no. 1, p. 5767864, 2018, doi: 10.1155/2018/5767864.

[8] A. Kumar, Er. R. Khan, and Deepika, “A Review On Heart Disease Detection Using Machine Learning Techniques,” in 2024 Sixth International Conference on Computational Intelligence and Communication Technologies (CCICT), Apr. 2024, pp. 317–323. doi: 10.1109/CCICT62777.2024.00059.

[9] “Full article: Systematic reviews of machine learning in healthcare: a literature review.” Accessed: Sep. 01, 2025. [Online]. Available: https://www.tandfonline.com/doi/full/10.1080/14737167.2023.2279107

[10] L. Xu, L. Sanders, K. Li, and J. C. L. Chow, “Chatbot for Health Care and Oncology Applications Using Artificial Intelligence and Machine Learning: Systematic Review,” JMIR Cancer, vol. 7, no. 4, p. e27850, Nov. 2021, doi: 10.2196/27850.

[11] K. Morooka, M. Nakamoto, and Y. Sato, “A Survey on Statistical Modeling and Machine Learning Approaches to Computer Assisted Medical Intervention: Intraoperative Anatomy Modeling and Optimization of Interventional Procedures,” IEICE Trans. Inf., vol. E96-D, no. 4, pp. 784–797, Apr. 2013, doi: 10.1587/transinf.E96.D.784.

[12] E. Miranda, F. M. Bhatti, M. Aryuni, and C. Bernando, “Intelligent Computational Model for Early Heart Disease Prediction using Logistic Regression and Stochastic Gradient Descent (A Preliminary Study),” in 2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI), Oct. 2021, pp. 11–16. doi: 10.1109/ICCSAI53272.2021.9609724.

[13] Z. Selvitopi and H. Selvitopi, “Machine learning methods for predicting cardiovascular diseases analyzing a hybrid dataset,” Procedia Comput. Sci., vol. 258, pp. 3535–3543, 2025, doi: 10.1016/j.procs.2025.04.609.

[14] “Model-Based ROC Curve: Examining the Effect of Case Mix and Model Calibration on the ROC Plot - Mohsen Sadatsafavi, Paramita Saha-Chaudhuri, John Petkau, 2022.” Accessed: Sep. 01, 2025. [Online]. Available: https://journals.sagepub.com/doi/full/10.1177/0272989X211050909

[15] A. M. Carrington et al., “Deep ROC Analysis and AUC as Balanced Average Accuracy, for Improved Classifier Selection, Audit and Explanation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 329–341, Jan. 2023, doi: 10.1109/TPAMI.2022.3145392.

[16] P. P. Win, S. W. Phyo, and K. K. Zaw, “Comparative Analysis of Predicting Hospitalization Time for Diabetes Patients Using Gradient Boosting and Random Forest Algorithms,” in 2024 5th International Conference on Advanced Information Technologies (ICAIT), Nov. 2024, pp. 1–6. doi: 10.1109/ICAIT65209.2024.10754940.

[17] “Predicting Adult Hospital Admission from Emergency Department Using Machine Learning: An Inclusive Gradient Boosting Model.” Accessed: Sep. 01, 2025. [Online]. Available: https://www.mdpi.com/2077-0383/11/23/6888

[18] B. T. Mashi, M. Hamada, J. J. Tanimu, P. Robert, and T. J. Samson, “An Ensemble Approach for Stroke Prediction,” in 2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), Dec. 2024, pp. 381–388. doi: 10.1109/MCSoC64144.2024.00069.

[19] P. N. Srinivasu, N. Sandhya, R. H. Jhaveri, and R. Raut, “From Blackbox to Explainable AI in Healthcare: Existing Tools and Case Studies,” Mob. Inf. Syst., vol. 2022, no. 1, p. 8167821, 2022, doi: 10.1155/2022/8167821.

[20] T. Wang and Q. Lin, “Hybrid Predictive Models: When an Interpretable Model Collaborates with a Black-box Model,” J. Mach. Learn. Res., vol. 22, no. 137, pp. 1–38, 2021.

[21] A. Maalej, U. Johansson, and T. Lofstrom, “Evaluating Calibration Techniques for Reliable Predictions,” in Machine Learning and Soft Computing, L. Huang, Ed., Singapore: Springer Nature, 2025, pp. 159–175. doi: 10.1007/978-981-96-6403-0_14.

[22] S. Xu, Z. Jiang, Z. Chen, D. Pan, H. Yu, and L. Li, “Blast Furnace Condition Recognizing in the Ironmaking Process Based on Prior Knowledge and Platt Scaling Probability,” in 2024 IEEE International Conference on Industrial Technology (ICIT), Mar. 2024, pp. 1–6. doi: 10.1109/ICIT58233.2024.10540890.

[23] A. Janosi, W. Steinbrunn, M. Pfisterer, and R. Detrano, “‘Heart Desease’ UCI Machine Learning Repository.” doi: https://doi.org/10.24432/C52P4X.

[24] R. Detrano et al., “International application of a new probability algorithm for the diagnosis of coronary artery disease,” Am. J. Cardiol., vol. 64, no. 5, pp. 304–310, Aug. 1989, doi: 10.1016/0002-9149(89)90524-9.

[25] K. K. Napa, R. Govindarajan, S. Sathya, J. S. Murugan, and B. K. P. Vijayammal, “Comparative analysis of explainable machine learning models for cardiovascular risk stratification using clinical data and shapley additive explanations,” Intell.-Based Med., vol. 12, p. 100286, Jan. 2025, doi: 10.1016/j.ibmed.2025.100286.

[26] S. Tribuvan et al., “Performance Evaluation of Advanced Classification Models Combined with Feature Selection for Credit Risk Performance,” Procedia Comput. Sci., vol. 258, pp. 278–287, Jan. 2025, doi: 10.1016/j.procs.2025.04.265.

[27] J. Meng and R. Xing, “Inside the ‘black box’: Embedding clinical knowledge in data-driven machine learning for heart disease diagnosis,” Cardiovasc. Digit. Health J., vol. 3, no. 6, pp. 276–288, Dec. 2022, doi: 10.1016/j.cvdhj.2022.10.005.

[28] “Heart Disease Prediction Model Using Feature Selection and Ensemble Deep Learning with Optimized Weight,” CMES - Comput. Model. Eng. Sci., vol. 143, no. 1, pp. 875–909, Apr. 2025, doi: 10.32604/cmes.2025.061623.

[29] “What Does Your Bio Say? Inferring Twitter Users’ Depression Status From Multimodal Profile Information Using Deep Learning | IEEE Journals & Magazine | IEEE Xplore.” Accessed: Sep. 01, 2025. [Online]. Available: https://ieeexplore.ieee.org/document/9567734

[30] “Goal 3 | Department of Economic and Social Affairs.” Accessed: Sep. 01, 2025. [Online]. Available: https://sdgs.un.org/goals/goal3?utm_source=chatgpt.com

[31] “SDG Target 3.4 | Noncommunicable diseases and mental health: By 2030, reduce by one third premature mortality from non-communicable diseases through prevention and treatment and promote mental health and well-being.” Accessed: Sep. 01, 2025. [Online]. Available: https://www.who.int/data/gho/data/themes/topics/indicator-groups/indicator-group-details/GHO/sdg-target-3.4-noncommunicable-diseases-and-mental-health?utm_source=chatgpt.com

[32] Bayuaji L, Amzah MY, Pebrianti D. Optimization of feature selection in support vector machines (SVM) using recursive feature elimination (RFE) and particle swarm optimization (PSO) for heart disease detection. In2024 9th International Conference on Mechatronics Engineering (ICOM) 2024 Aug 13 (pp. 304-309). IEEE

Downloads

Published

2026-02-04

How to Cite

[1]
A. Cahyono, I. Ameriza, G. Gunadi, E. D. As Syifa, and M. K. Alqudah, “Calibration and Applied Statistical Modeling Using Logistic Regression on the UCI Heart Disease Dataset”, JAIC, vol. 10, no. 1, pp. 327–335, Feb. 2026.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.