Analysis of Gradient Boosted Trees Algorithm in Breast Cancer Classification

Authors

  • Cantika Okzen Suryaputri Universitas Amikom Yogyakarta
  • Majid Rahardi Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.30871/jaic.v10i1.11875

Keywords:

Breast Cancer Classification, Gradient Boosted Trees, LightGBM, Machine Learning, SHAP Explainability

Abstract

Early and accurate classification of breast cancer is essential to support clinical diagnostic processes and improve patient outcomes. This study proposes a comprehensive machine learning pipeline based on Gradient Boosted Tree algorithms to classify breast tumors into benign and malignant categories. The proposed framework integrates several preprocessing stages, including outlier handling using the Local Outlier Factor (LOF), feature normalization with StandardScaler, class imbalance handling using SMOTE, and feature selection through ANOVA-based SelectKBest. Five ensemble learning models—XGBoost, LightGBM, CatBoost, HistGradientBoosting, and GradientBoosting—were trained and evaluated using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The experimental results show that all models achieved strong and comparable classification performance. Among them, CatBoost obtained the highest ROC-AUC value of 0.9960, along with an accuracy of 0.9649, precision of 0.9750, recall of 0.9286, and F1-score of 0.9512. Statistical evaluation using the DeLong test indicated that the differences in ROC-AUC among the evaluated models were not statistically significant (p > 0.05), suggesting similar discriminative capabilities across models. To enhance model interpretability, SHAP (SHapley Additive exPlanations) was applied to the CatBoost model as a representative classifier. The results show that features related to nuclear size and shape, such as radius, area, perimeter, and concavity, contributed most significantly to malignant predictions. This study demonstrates that the integration of robust preprocessing techniques, Gradient Boosted Tree models, and explainable machine learning provides an accurate and interpretable approach for breast cancer classification. However, the evaluation was conducted on a single public dataset without external validation, and further studies using independent and real-world datasets are required before clinical deployment.

Downloads

Download data is not yet available.

References

[1] S. M. W and W. T. U, “Edukasi Kanker Payudara Pada Remaja Putri Melalui Media Daring Di SMP Negeri 1 Metro,” pp. 131–134, 2020.

[2] S. Susanto, S. A. Nugroho, and Y. T. Handoko, “Pengetahuan Ibu tentang Penyakit Kanker Payudara Berhubungan dengan Tingkat Kecemasan pada Pasien Pre Operasi Kanker Payudara,” J. Penelit. Perawat Prof., vol. 4, no. 2, pp. 589–598, 2022.

[3] T. Sofa, A. Wardiyah, and Rilyani, “Faktor Risiko Kanker Payudara Pada Wanita,” J. Penelit. Perawat Prof., vol. 2, no. 5474, pp. 1333–1336, 2024.

[4] https://dinkes.kamparkab.go.id/, “Fakta dan Permasalahan Kanker Payudara : Deteksi Dini itu Penting.”

[5] C. Hospital et al., “Kualitas Hidup Pasien Kanker Payudara Menggunakan Kuesioner QLQ BR-23 di Rumah Sakit Kanker Dharmais Jakarta Barat,” vol. 9, no. 2, pp. 186–199, 2024.

[6] R. R. Kadhim and M. Y. Kamil, “Comparison of machine learning models for breast cancer diagnosis,” IAES Int. J. Artif. Intell., vol. 12, no. 1, pp. 415–421, 2023.

[7] M. A. Elsadig, A. Altigani, and H. T. Elshoush, “Breast cancer detection using machine learning approaches: a comparative study,” Int. J. Electr. Comput. Eng., vol. 13, no. 1, pp. 736–745, 2023.

[8] N. C. Ramadhan, H. H. H, T. Rohana, and A. M. Siregar, “Optimasi Algoritma Machine Learning Menggunakan Seleksi Fitur Xgboost Untuk Klasifikasi Kanker Payudara,” TIN Terap. Inform. Nusant., vol. 5, no. 2, pp. 162–171, 2024.

[9] V. Sangeetha et al., “Breast cancer prediction using genetic algorithm and sand cat swarm optimization algorithm,” Indones. J. Electr. Eng. Comput. Sci., vol. 37, no. 2, pp. 849–858, 2025.

[10] P. Septiana Rizky, R. Haiban Hirzi, and U. Hidayaturrohman, “Perbandingan Metode LightGBM dan XGBoost dalam Menangani Data dengan Kelas Tidak Seimbang,” J Stat. J. Ilm. Teor. dan Apl. Stat., vol. 15, no. 2, pp. 228–236, 2022.

[11] L. W. Rizkallah, “Enhancing the performance of gradient boosting trees on regression problems,” J. Big Data, vol. 12, no. 1, 2025.

[12] N. A. Pratama and D. W. Utomo, “Deteksi Diabetes Mellitus dengan Menggunakan Teknik Ensemble XGBoost dan LightGBM,” vol. XX, no. Xx, pp. 1–10, 2025.

[13] R. Zizilia et al., “Klasifikasi Penyakit Kanker Paru-Paru dengan Algoritma Extreme Gradient Boosting (XGBoost)dan Mutual Informationsebagai Metode Feature Selection,” Sist. J. Sist. Inf., vol. 14, no. 5, pp. 2540–9719, 2025.

[14] A. Al Tawil, L. Almazaydeh, B. Alqudah, A. Z. Abualkishik, and A. A. Alwan, “Predictive modeling for breast cancer based on machine learning algorithms and features selection methods,” Int. J. Electr. Comput. Eng., vol. 14, no. 2, pp. 1937–1947, 2024.

[15] S. Akbulut, I. B. Cicek, and C. Colak, “Classification of Breast Cancer on the Strength of Potential Risk Factors with Boosting Models: A Public Health Informatics Application,” Haseki Tip Bul., vol. 60, no. 3, pp. 196–203, 2022.

[16] P. Wongyikul, N. Thongyot, P. Tantrakoolcharoen, P. Seephueng, and P. Khumrin, “High alert drugs screening using gradient boosting classifier,” Sci. Rep., vol. 11, no. 1, pp. 1–24, 2021.

[17] N. L. Fitriyani et al., “A Novel Approach Utilizing Bagging, Histogram Gradient Boosting, and Advanced Feature Selection for Predicting the Onset of Cardiovascular Diseases,” Mathematics, vol. 13, no. 13, pp. 1–26, 2025.

[18] H. Luo et al., “SHAP based predictive modeling for 1 year all-cause readmission risk in elderly heart failure patients: feature selection and model interpretation,” Sci. Rep., vol. 14, no. 1, pp. 1–15, 2024.

[19] M. R. Salmanpour, A. H. Pouria, S. Falahati, S. Taeb, and S. S. Mehrnia, “Robust Semi-Supervised CT Radiomics for Lung Cancer Prognosis : Cost- Effective Learning with Limited Labels and SHAP Interpretation,” no. Ml, pp. 1–12.

[20] D. Madhani, “‘Towards Explainable and Deployable AI for Personalized Diabetes Prediction: A Pipeline with SMOTE, Feature Selection, and SHAP-Based Model Interpretation,’” Rev. Electron. Vet., vol. 25, no. 1, pp. 3967–3976, 2025.

[21] I. F. Rosyid and H. Pramaditya, “Visual Interpretation of Machine Learning Models (Random Forest) for Lung Cancer Risk Classification Using Explainable Artificial Intelligence (SHAP & LIME),” J. Tek. Inform., vol. 6, no. 4, pp. 2187–2206, 2025.

[22] X. Zhang, S. Lin, Q. Zeng, L. Peng, and C. Yan, “Machine learning and SHAP value interpretation for predicting cardiovascular disease risk in patients with diabetes using dietary antioxidants,” Front. Nutr., vol. 12, no. July, pp. 1–15, 2025.

[23] M Yasser H, “Breast Cancer Dataset,” 2021.

[24] I. N. Rizki, D. Prayoga, M. L. Puspita, and M. Q. Huda, “Implementasi Exploratory Data Analysis Untuk Analisis Dan Visualisasi Data Penderita Stroke Kalimantan Selatan Menggunakan Platform Tableau,” J. Inform. dan Tek. Elektro Terap., vol. 12, no. 1, 2024.

[25] M. P. Pulungan, A. Purnomo, and A. Kurniasih, “Penerapan SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Kepribadian MBTI Menggunakan Naive Bayes Classifier,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 5, pp. 1033–1042, 2024.

[26] M. Mutmainah and W. Yustanti, “Studi Komparasi Local Outlier Factor (LOF) dan Isolation Forest (IF) pada Analisis Anomali Kinerja Dosen,” J. Informatics Comput. Sci., vol. 6, no. 02, pp. 532–540, 2024.

[27] M. F. Thoriq, W. J. Pranoto, and F. Faldi, “Penerapan Seleksi Fitur Analysis of Variance Pada Algoritma Random Forest Classifier Dalam Klasifikasi Nilai Mahasiswa,” Explor. J. Sist. Inf. dan Telemat., vol. 14, no. 2, p. 185, 2023.

[28] J. Li, Y. Zhou, Y. Li, and Y. Liu, “Nuclear Morphological Characteristics in Breast Cancer : Correlation with Hormone Receptor and Human Epidermal Growth Factor Receptor 2,” vol. 2021, 2021.

Downloads

Published

2026-02-04

How to Cite

[1]
C. O. Suryaputri and M. Rahardi, “Analysis of Gradient Boosted Trees Algorithm in Breast Cancer Classification”, JAIC, vol. 10, no. 1, pp. 605–618, Feb. 2026.

Most read articles by the same author(s)

1 2 3 > >> 

Similar Articles

<< < 35 36 37 

You may also start an advanced similarity search for this article.