Comparative Analysis of Random Forest and XGBoost Models for Cervical Cancer Risk Prediction using SHAP-based Explainable AI

Authors

  • Muhammad Agung Reza Yudha Universitas Amikom Yogyakarta
  • Majid Rahardi Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.30871/jaic.v9i6.10357

Keywords:

Cervical Cancer, XGBoost, Random Forest, SMOTE, SHAP

Abstract

Cervical cancer remains one of the leading causes of cancer-related deaths among women, particularly in developing countries such as Indonesia. This study aims to develop an accurate and interpretable predictive model for cervical cancer risk using Random Forest (RF) and Extreme Gradient Boosting (XGBoost) algorithms. The dataset used is the Cervical Cancer Risk Factors from the UCI Repository, consisting of 858 patient records and 36 clinical and demographic features. The preprocessing stages include missing value imputation, class balancing using Synthetic Minority Oversampling Technique (SMOTE), and hyperparameter optimization through Randomized Search CV. Experimental results show that both models achieved high performance, with accuracy exceeding 96% and AUC above 0.95, while the XGBoost (Tuned + SMOTE) model slightly outperformed RF in detecting positive cases. The interpretability analysis using SHapley Additive exPlanations (SHAP) identified clinical features such as Schiller Test, Hinselmann Test, and Cytology Result as the most influential factors in the classification process, consistent with established clinical evidence. Therefore, the integration of XGBoost, SMOTE, and SHAP provides a predictive framework that is not only highly accurate but also clinically explainable, supporting the development of decision-support systems for early cervical cancer detection.

Downloads

Download data is not yet available.

References

[1] World Health Organization, “Global Cancer Observatory (GLOBOCAN) 2023 Report,” 2023.

[2] Kementerian Kesehatan Republik Indonesia, “Profil Kesehatan Indonesia 2024,” 2024. [Online]. Available: https://pusdatin.kemkes.go.id/resources/download/pusdatin/profil-kesehatan-indonesia/Profil-Kesehatan-Indonesia-2024.pdf. [Accessed: 18-Oct-2025].

[3] P. Roy, M. Hasan, M. R. Islam, and M. P. Uddin, “Interpretable artificial intelligence (AI) for cervical cancer risk analysis leveraging stacking ensemble and expert knowledge,” Digit. Heal., vol. 11, 2025.

[4] A. AlMohimeed, H. Saleh, S. Mostafa, R. M. A. Saad, and A. S. Talaat, “Cervical Cancer Diagnosis Using Stacked Ensemble Model and Optimized Feature Selection: An Explainable Artificial Intelligence Approach,” Computers, vol. 12, no. 10, 2023.

[5] J. Fernandes, K.; Cardoso, J.; Fernandes, “Cervical Cancer (Risk Factors),” UCI Machine Learning Repository, 2017. [Online]. Available: https://doi.org/10.24432/C5Z310.

[6] K. Chadaga, S. Prabhu, N. Sampathila, R. Chadaga, S. Swathi, and S. Sengupta, “Predicting cervical cancer biopsy results using demographic and epidemiological parameters: a custom stacked ensemble machine learning approach,” Cogent Eng., vol. 9, no. 1, 2022.

[7] M. M. Muraru, Z. Simó, and L. B. Iantovics, “Cervical Cancer Prediction Based on Imbalanced Data Using Machine Learning Algorithms with a Variety of Sampling Methods,” Appl. Sci., vol. 14, no. 22, pp. 1–22, 2024.

[8] H. Karamti et al., “Improving Prediction of Cervical Cancer Using KNN Imputed SMOTE Features and Multi-Model Ensemble Learning Approach,” Cancers (Basel)., vol. 15, no. 17, 2023.

[9] L. E. O. Breiman, “Random Forests,” pp. 5–32, 2001.

[10] T. Chen and C. Guestrin, “XGBoost : A Scalable Tree Boosting System,” pp. 785–794, 2016.

[11] B. Allen, “The Promise of Explainable AI in Digital Health for Precision Medicine: A Systematic Review,” J. Pers. Med., vol. 14, no. 3, 2024.

[12] S. M. Lundberg and S. I. Lee, “A unified approach to interpreting model predictions,” Adv. Neural Inf. Process. Syst., vol. 2017-Decem, no. Section 2, pp. 4766–4775, 2017.

[13] K. Toffaha, M. C. E. Simsekler, A. Sleptchenko, M. A. Kortt, and L. L. Bukasa, “A Machine Learning and Bayesian Belief Network Approach to Predicting Cervical Cancer Risk: Implications for Risk Management,” J. Multidiscip. Healthc., vol. 18, no. May, pp. 5199–5211, 2025.

[14] J. Fernandes, K.; Cardoso, J.; Fernandes, “Cervical Cancer (Risk Factors).” [Online]. Available: https://doi.org/10.24432/C5Z310.

[15] B. Vazquez et al., “Machine and Deep Learning for the Diagnosis, Prognosis, and Treatment of Cervical Cancer: A Scoping Review,” Diagnostics, vol. 15, no. 12, pp. 1–42, 2025.

[16] T. Chen and C. Guestrin, “XGBoost,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794.

[17] H. Ahmadzadeh Sarhangi, D. Beigifard, E. Farmani, and H. Bolhasani, “Deep learning techniques for cervical cancer diagnosis based on pathology and colposcopy images,” Informatics Med. Unlocked, vol. 47, no. March, p. 101503, 2024.

[18] M. S. Ali, M. M. Hossain, M. A. Kona, K. R. Nowrin, and M. K. Islam, “An ensemble classification approach for cervical cancer prediction using behavioral risk factors,” Healthc. Anal., vol. 5, no. February, p. 100324, 2024.

[19] R. Chauhan, A. Goel, B. Alankar, and H. Kaur, “Predictive modeling and web-based tool for cervical cancer risk assessment: A comparative study of machine learning models,” MethodsX, vol. 12, no. March, 2024.

[20] K. M. M. Uddin, A. Al Mamun, A. Chakrabarti, R. Mostafiz, and S. K. Dey, “An ensemble machine learning-based approach to predict cervical cancer using hybrid feature selection,” Neurosci. Informatics, vol. 4, no. 3, p. 100169, 2024.

[21] M. I. H. Siddiqui et al., “Accelerated and accurate cervical cancer diagnosis using a novel stacking ensemble method with explainable AI,” Informatics Med. Unlocked, vol. 56, no. February, p. 101657, 2025.

[22] R. Shakil, S. Islam, and B. Akter, “A precise machine learning model: Detecting cervical cancer using feature selection and explainable AI,” J. Pathol. Inform., vol. 15, no. July, p. 100398, 2024.

[23] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002.

[24] D. Anggitasyah and M. A. P. Siregar, “Penerapan Metode Smote Extreme Gradient Boosting Untuk Klasifikasi Penyakit Kanker Serviks Di Kota Medan,” Jutisi J. Ilm. Tek. Inform. dan Sist. Inf., vol. 12, no. 2, p. 526, 2023.

[25] Riska Chairunisa, Adiwijaya, and Widi Astuti, “Perbandingan CART dan Random Forest untuk Deteksi Kanker berbasis Klasifikasi Data Microarray,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 5, pp. 805–812, 2020.

[26] L. Qian, Q. Huang, Y. Chen, and J. Chen, “A Voting-Stacking Ensemble of Inception Networks for Cervical Cytology Classification,” 2023.

Downloads

Published

2025-12-06

How to Cite

[1]
M. A. R. Yudha and M. Rahardi, “Comparative Analysis of Random Forest and XGBoost Models for Cervical Cancer Risk Prediction using SHAP-based Explainable AI”, JAIC, vol. 9, no. 6, pp. 3198–3211, Dec. 2025.

Most read articles by the same author(s)

1 2 3 > >> 

Similar Articles

<< < 1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.