Machine Learning Based Prediction of Osteoporosis Risk Using the Gradient Boosting Algorithm and Lifestyle Data

Authors

  • Edwin Ibrahim Salim Universitas Amikom Yogyakarta
  • Majid Rahardi Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.30871/jaic.v9i6.10483

Keywords:

Osteoporosis Classification, Machine Learning, Lifestyle, Gradient Boosting

Abstract

Osteoporosis is a degenerative disease characterized by decreased bone mass and an increased risk of fractures, particularly among the elderly population. Early detection is essential; however, standard diagnostic methods such as Dual-Energy X-ray Absorptiometry (DEXA) remain limited in terms of availability and cost. This study aims to develop a machine learning-based risk prediction model for osteoporosis by utilizing lifestyle data with the Gradient Boosting algorithm. The secondary dataset was obtained from the Kaggle platform, consisting of 1,958 samples covering lifestyle and clinical attributes such as age, gender, physical activity, smoking habits, calcium intake, vitamin D consumption, and family history. Preprocessing involved normalization and categorical feature encoding, along with a balance check of class distribution, which indicated that the dataset was relatively balanced. The data were then divided using stratified sampling with an 80% training set and 20% testing set. Model performance was evaluated using accuracy, precision, recall, F1-score, and the area under the ROC curve (AUC). The results showed that the Gradient Boosting algorithm achieved an accuracy of 91%, precision of 90.8%, recall of 90.2%, F1-score of 90.5%, and an AUC of 0.92, outperforming baseline methods such as Logistic Regression and Random Forest. These findings demonstrate that Gradient Boosting is effective as a decision-support tool for early osteoporosis screening based on lifestyle data and has the potential to be integrated into clinical decision-making systems to enhance early detection in healthcare services. Nevertheless, since this study relied on a secondary dataset from Kaggle, the results require further validation using real clinical data from Indonesia to ensure representativeness for the local population.

Downloads

Download data is not yet available.

References

[1] F. Amani, M. Amanzadeh, M. Hamedan, and P. Amani, “Diagnostic accuracy of deep learning in prediction of osteoporosis: a systematic review and meta-analysis,” BMC Musculoskelet. Disord., vol. 25, no. 1, 2024, doi: 10.1186/s12891-024-08120-7.

[2] International Osteoporosis Foundation, “Facts and Statistics,” IOF. [Online]. Available: https://www.osteoporosis.foundation/

[3] W. H. Organization, “Cervical cancer.” Accessed: Jul. 13, 2025. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cervical-cancer

[4] K. K. R. Indonesia, “Situasi osteoporosis di Indonesia,” Kementeri. Kesehat. RI, no. Jakarta, 2022.

[5] Y. He, J. Lin, S. Zhu, J. Zhu, and Z. Xu, “Deep learning in the radiologic diagnosis of osteoporosis: a literature review,” J. Int. Med. Res., vol. 52, no. 4, 2024, doi: 10.1177/03000605241244754.

[6] S. M. Baik, H. J. Kwon, Y. Kim, J. Lee, Y. H. Park, and D. J. Park, “Machine learning model for osteoporosis diagnosis based on bone turnover markers,” Health Informatics J., vol. 30, no. 3, pp. 1–15, 2024, doi: 10.1177/14604582241270778.

[7] M. Almohaimeed, “Enhancing Prediction of Osteoporosis Using Supervised and Unsupervised Learning: New Approach to Disease Subtyping,” Intell. Inf. Manag., vol. 17, no. 02, pp. 31–47, 2025, doi: 10.4236/iim.2025.172002.

[8] F. R. Carvalho and P. J. Gavaia, “Enhancing osteoporosis risk prediction using machine learning: A holistic approach integrating biomarkers and clinical data,” Comput. Biol. Med., vol. 192, no. PB, p. 110289, 2025, doi: 10.1016/j.compbiomed.2025.110289.

[9] I. Irmawati, E. Herdit Juningsih, and Y. Yanto, “Predictive Modeling of Osteoporosis Risk Factors using XGBoost and Bagging Ensemble Technique,” J. Med. Informatics Technol., pp. 6–10, 2024, doi: 10.37034/medinftech.v2i1.27.

[10] M. L. T. Alfianti and R. Supriyanto, “Perbandingan Kinerja Algoritma Random Forest, AdaBoost, dan XGBoost Dalam Memprediksi Resiko Penyakit Osteoporosis,” J. Ilmu Komput. dan Agri-Informatika, vol. 11, no. 2, pp. 172–184, 2024, doi: 10.29244/jika.11.2.172-184.

[11] Q. Wu and J. Jung, “Ensemble-learning approach improves fracture prediction using genomic and phenotypic data,” Osteoporos. Int., vol. 36, no. 5, pp. 811–821, 2025, doi: 10.1007/s00198-025-07437-w.

[12] A. S. Asim, “Osteoporosis Risk Prediction Dataset,” kaggle. [Online]. Available: https://www.kaggle.com/datasets/asimshah/osteoporosis-risk-prediction

[13] D. Chevalier and M.-P. Côté, “From Point to probabilistic gradient boosting for claim frequency and severity prediction,” 2024, [Online]. Available: http://arxiv.org/abs/2412.14916

[14] M. Salmi, D. Atif, D. Oliva, A. Abraham, and S. Ventura, Handling imbalanced medical datasets: review of a decade of research, vol. 57, no. 10. Springer Netherlands, 2024. doi: 10.1007/s10462-024-10884-2.

[15] W. Dabour, “Multiclass Osteoporosis Detection Using Woodpecker-Optimized CNN-XGBoost & predicting Diagnostic Accuracy via A Machine Learning Approach,” J. Commun. Sci. Inf. Technol., vol. 5, no. 1, pp. 0–0, 2024, doi: 10.21608/jcsit.2024.319582.1010.

[16] J. Montiel, R. Mitchell, E. Frank, B. Pfahringer, T. Abdessalem, and A. Bifet, “Adaptive XGBoost for Evolving Data Streams,” Proc. Int. Jt. Conf. Neural Networks, no. 1, 2020, doi: 10.1109/IJCNN48605.2020.9207555.

[17] C. Huang et al., “Application of deep learning model based on unenhanced chest CT for opportunistic screening of osteoporosis: a multicenter retrospective cohort study,” Insights Imaging, vol. 16, no. 1, 2025, doi: 10.1186/s13244-024-01817-2.

[18] W. H. Organization, “Ageing and health,” World Health Organization. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/ageing-and-health

[19] and C. J. P. Ian H. Witten, Eibe Frank, Mark A. Hall, Data Mining: Practical Machine Learning Tools and Techniques. Burlington, Massachusetts: Elsevier, 2013. doi: 10.1016/C2015-0-02071-8.

Downloads

Published

2025-12-05

How to Cite

[1]
E. I. Salim and M. Rahardi, “Machine Learning Based Prediction of Osteoporosis Risk Using the Gradient Boosting Algorithm and Lifestyle Data”, JAIC, vol. 9, no. 6, pp. 3138–3145, Dec. 2025.

Most read articles by the same author(s)

1 2 3 > >> 

Similar Articles

<< < 5 6 7 8 9 > >> 

You may also start an advanced similarity search for this article.