Early Detection of Sleep Disorders Using Ensemble Boosting and Classical ML with Lifestyle Data

Sendi Novianto; Muhammad Rayhan Ramadhani

doi:10.30871/jaic.v10i2.12317

Authors

Sendi Novianto Universitas Dian Nuswantoro
Muhammad Rayhan Ramadhani Universitas Dian Nuswantoro

DOI:

https://doi.org/10.30871/jaic.v10i2.12317

Keywords:

Sleep Disorder, Machine Learning, Ensemble Boosting, Classical Machine Learning, Predictive Modelling, Lifestyle Data

Abstract

Sleep disorders are increasingly prevalent in modern society, significantly impacting quality of life, productivity, and physical and mental health. This study distinguishes itself by evaluating six machine learning algorithms—Support Vector Machine (SVM), Naïve Bayes, K-Nearest Neighbor (KNN), XGBoost, CatBoost, and LightGBM—using a comprehensive preprocessing pipeline that ensures balanced, normalized, and leakage-free training data, enabling a robust comparison of classical and ensemble boosting models for sleep disorder prediction. Data preprocessing included handling missing values, encoding categorical features, normalization, class imbalance correction via Random Oversampling, and stratified train-test splitting. Models were optimized through hyperparameter tuning with GridSearchCV and evaluated using accuracy, precision, recall, F1-score, and multiclass ROC-AUC metrics. Feature importance analysis revealed that Age, Occupation, Heart Rate, Sleep Duration, and Daily Steps were the most influential predictors, highlighting the interpretability of the models. LightGBM achieved the highest predictive performance with 70.0% accuracy and a ROC-AUC of 0.736, followed by Naïve Bayes (67.5% accuracy, ROC-AUC 0.784) and XGBoost (65.0% accuracy, ROC-AUC 0.728). McNemar’s test indicated no statistically significant difference in performance among models, confirming prediction stability. Models also differed in computational efficiency, with Naïve Bayes being the fastest and CatBoost the slowest, reflecting algorithmic complexity. These findings suggest that ensemble boosting algorithms, particularly LightGBM and XGBoost, alongside classical models like Naïve Bayes, provide effective, interpretable, and reliable tools for early detection of sleep disorder risk, with potential applications in wearable devices, digital health platforms, and clinical monitoring systems.

Downloads

Download data is not yet available.

References

[1] World Health Organization, “Sleep disorders: a global public health problem,” WHO, 2021.

[2] Sabino Cappuccio, Francesco D’Elia, Paola Strazzullo, and Michele Miller, “Sleep duration and all-cause mortality: a systematic review and meta-analysis of prospective studies,” Sleep, vol. 34, no. 5, pp. 585–592, 2011.

[3] L. M. Senaratna et al., “Epidemiology of sleep disorders in adults: A systematic review,” Sleep Med., vol. 70, pp. 1–10, 2020.

[4] Sanjay R. Punjabi, “The epidemiology of adult obstructive sleep apnea,” Proc. Am. Thorac. Soc., vol. 5, no. 2, pp. 136–143, 2008.

[5] Yiming Li, Xinyu Wang, Qian Liu, and Jun Zhang, “Sleep apnea and cardiometabolic risk: recent epidemiological evidence,” Journal of Clinical Sleep Medicine, vol. 18, no. 1, pp. 45–46, 2022.

[6] Aisyah Nurlita and Sri Wijayani, “Hubungan penggunaan gadget dan pola aktivitas terhadap kualitas tidur remaja,” Jurnal Kesehatan Remaja, vol. 3, no. 2, pp. 101–110, 2023.

[7] Ying Zhang, Li Wei, Chen Hao, and Sun Mei, “Lifestyle factors and sleep quality among young adults: a cross-sectional study,” Int. J. Environ. Res. Public Health, vol. 18, no. 12, p. 6301, 2021.

[8] Sunita Mehta, “Sleep quality, cognitive function, and academic performance,” Front. Psychol., vol. 13, p. 1023, 2022.

[9] Agus Putra and Rizky Hidayat, “Pengaruh pola aktivitas harian dan tekanan kerja terhadap risiko insomnia,” Jurnal Psikologi dan Kesehatan, vol. 7, no. 1, pp. 15–24, 2024.

[10] Soo Kim, Hyun Lee, and Min Park, “Daily activity patterns and sleep quality among working adults,” Sleep Health, vol. 7, no. 4, pp. 345–352, 2021.

[11] Rajesh Esteva et al., “A guide to deep learning in healthcare,” Nat. Med., vol. 25, no. 1, pp. 24–29, 2019.

[12] Alvin Rajkomar, Jeffrey Dean, and Ian Goodfellow, “Machine learning in medicine,” New England Journal of Medicine, vol. 380, no. 14, pp. 1347–1358, 2019.

[13] Rina Maulidah and Dwi Hidayati, “Machine learning approach for sleep quality prediction based on lifestyle data,” Jurnal Informatika, vol. 18, no. 2, pp. 55–64, 2024.

[14] Hsiao-Chien Chien and Ming-Tsung Lee, “Predicting sleep disorders using machine learning models and lifestyle data,” Comput. Biol. Med., vol. 145, 2022.

[15] X. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2016, pp. 785–794.

[16] Guolin Ke et al., “LightGBM: A highly efficient gradient boosting decision tree,” in Advances in Neural Information Processing Systems 30 (NeurIPS 2017), 2017.

[17] Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin, “CatBoost: unbiased boosting with categorical features,” in Advances in Neural Information Processing Systems 31 (NeurIPS 2018), 2018.

[18] Rui Liu, Kai Zhang, and Li Chen, “Ensemble boosting for health data classification: recent advances,” Comput. Biol. Med., vol. 134, 2021.

[19] Thanh Tran and Minh Nguyen, “Comparative analysis of boosting algorithms on tabular health datasets,” Expert Syst. Appl., vol. 198, 2022.

[20] Yifan Wang, Lin Zhao, and Xiaoming Chen, “Gradient boosting models for predicting health outcomes: a review,” J. Biomed. Inform., vol. 135, 2023.

[21] Corinna Cortes and Vladimir N. Vapnik, “Support vector networks,” in Machine Learning, vol. 20, no. 3, 1995, pp. 273–297.

[22] Mohammed Alshammari, Ahmed Aljohani, and Khalid Almutairi, “Almutairi Applications of classical machine learning algorithms in healthcare,” J. Healthc. Eng., vol. 2021, pp. 1–14, 2021.

[23] Huy Nguyen, Anh Tran, and Linh Le, “KNN and Naive Bayes for medical data classification,” Int. J. Data Sci. Anal., vol. 13, pp. 45–46, 2022.

[24] Bisong, Google Colaboratory for Machine Learning Applications. 2019.

[25] M. Chen, Hands-On Machine Learning on Google Colab. 2022.

[26] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. 2012.

[27] A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd ed. 2019.

[28] J. Kuhn and K. Johnson, Applied Predictive Modeling. 2013.

[29] P. Raschka and V. Mirjalili, Python Machine Learning, vol. 3rd. 2020.

[30] Ron Kohavi, “A Study of Cross Validation and Bootstrap for Accuracy Estimation and Model Selection,” in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1137–1145, 1995.

[31] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–359, 2002.

[32] Natalia Buda, Akira Maki, and Maciej A. Mazurowski, “A systematic study of the class imbalance problem in convolutional neural networks,” Neural Networks, vol. 106, pp. 249–259, 2020.

[33] Tom Fawcett, “An introduction to ROC analysis,” Pattern Recognit. Lett., vol. 27, no. 8, pp. 861–874, 2006.

[34] L. Molnar, Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 2020.

[35] A. V. Dorogush, V. Ershov, and A. Gulin, “CatBoost: gradient boosting with categorical features support,” Oct. 2018.

[36] T. Raschka, Python Machine Learning, 2nd ed., vol. 2018.

[37] T. Cover and P. Hart, “Nearest Neighbor Pattern Classification,” IEEE Trans. Inf. Theory, vol. 13, no. 1, pp. 21–27, 1967.

Early Detection of Sleep Disorders Using Ensemble Boosting and Classical ML with Lifestyle Data

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

submit

tools

issn