Breast Cancer Detection using Decision Tree and Random Forest
DOI:
https://doi.org/10.30871/jaic.v9i2.9073Keywords:
Breast Cancer, Decision Tree, Random Forest, Machine Learning, Bootstrap AggregatingAbstract
Cancer is one of the most challenging diseases to cure and is a chronic condition that contributes significantly to global mortality. With advancements in artificial intelligence (AI) technology, AI-integrated systems can provide quick and accurate diagnoses based on collected medical data. By leveraging machine learning techniques, this study aims to compare the performance of two models using the Decision Tree (DT) and Random Forest (RF) algorithms on routine blood test data. The research process involves data preprocessing techniques such as handling missing values, detecting outliers, and feature selection, followed by applying the bootstrap aggregating technique to enhance model performance. Feature selection is used to identify the most significant features in the data that contribute to cancer detection. Using the KBest feature selection technique, the study found that the features age, BMI, leptin, adiponectin, and MCP-1 had the highest correlation with the target variable. The resulting models were evaluated to compare the performance of each algorithm. The evaluation results showed that the RF algorithm outperformed DT, achieving an accuracy of 89.65% on the processed dataset using the bootstrap technique, compared to DT's accuracy of 80.17%. Additionally, the RF algorithm demonstrated superior metric values, including a precision of 91.66% and an F1-score of 87.12%. This study concludes that the RF algorithm is more effective than DT for detecting cancer in limited datasets, especially when used with the bootstrap technique. The findings are expected to support the development of decision support systems in healthcare services for more accurate early cancer detection.
Downloads
References
[1] Ferlay J et al., “Global Cancer Observatory: Cancer Today,” Lyon, France: International Agency for Research on Cancer. Accessed: Jun. 01, 2024. [Online]. Available: https://gco.iarc.who.int/today
[2] National Cancer Institute, “What Is Cancer?,” National Cancer Institute at the National Institutes of Health. Accessed: Jun. 03, 2024. [Online]. Available: https://www.cancer.gov/about-cancer/understanding/what-is-cancer
[3] K. V Shiny, A. K. Ajnabi, A. Kumar, B. K. Singh, and A. Gupta, “A Machine Learning Approach for Breast Cancer Detection using Random Forest Algorithm,” International Journal of Research in Engineering, Science and Management, vol. 7, no. 4, pp. 14–18, 2024.
[4] J. Crisostomo et al., “Hyperresistinemia and metabolic dysregulation: a risky crosstalk in obese breast cancer,” Endocrine, vol. 53, pp. 433–442, 2016.
[5] H. Sun, C. Yin, Q. Liu, F. Wang, and C. Yuan, “Clinical significance of routine blood test-associated inflammatory index in breast cancer patients,” Med Sci Monit, vol. 23, p. 5090, 2017.
[6] Y.-Y. Wang, A. C. Hung, S. Lo, and S.-S. F. Yuan, “Adipocytokines visfatin and resistin in breast cancer: Clinical relevance, biological mechanisms, and therapeutic potential,” Cancer Lett, vol. 498, pp. 229–239, 2021.
[7] S. J. Russell and P. Norvig, Artificial intelligence: a modern approach. Pearson, 2022.
[8] D. B. Rarasati and J. C. A. Putra, “Correlation between Twitter sentiment analysis with three kernels using algorithm support vector machine (SVM) governor candidate electability level,” in AIP Conference Proceedings, AIP Publishing, 2023.
[9] B. Hakim, “Analisa Sentimen Data Text Preprocessing Pada Data Mining Dengan Menggunakan Machine Learning,” Jbase-Journal of business and audit information systems, vol. 4, no. 2, 2021.
[10] A. Thenata and M. Suryadi, “Machine Learning Prediction of Anxiety Levels in the Society of Academicians During the Covid-19 Pandemic,” Jurnal Varian, vol. 6, no. 1, Nov. 2022, doi: https://doi.org/10.30812/varian.v6i1.2149.
[11] D. Sulaiman and T. Mulyana, “Web-Based Writing Learning Application of Basic Hanacaraka Using Convolutional Neural Network Method,” Ultimatics : Jurnal Teknik Informatika, vol. 15, no. 1, Jun. 2023, doi: https://doi.org/10.31937/ti.v15i1.2993.
[12] C. Kaur and U. Garg, “Artificial intelligence techniques for cancer detection in medical image processing: A review,” Mater Today Proc, vol. 81, pp. 806–809, 2023.
[13] A. B. Nassif, M. A. Talib, Q. Nasir, Y. Afadar, and O. Elgendy, “Breast cancer detection using artificial intelligence techniques: A systematic literature review,” Artif Intell Med, vol. 127, p. 102276, 2022.
[14] D. Patel, Y. Shah, N. Thakkar, K. Shah, and M. Shah, “Implementation of artificial intelligence techniques for cancer detection,” Augmented Human Research, vol. 5, pp. 1–10, 2020.
[15] M. Shehab et al., “Machine learning in medical applications: A review of state-of-the-art methods,” Comput Biol Med, vol. 145, p. 105458, 2022.
[16] K. Marias, “The constantly evolving role of medical image processing in oncology: from traditional medical image processing to imaging biomarkers and radiomics,” J Imaging, vol. 7, no. 8, p. 124, 2021.
[17] T.-H. Lee, A. Ullah, and R. Wang, “Bootstrap aggregating and random forest,” Macroeconomic forecasting in the era of big data: Theory and practice, pp. 389–429, 2020.
[18] Y. Zhao and R. Duangsoithong, “Empirical analysis using feature selection and bootstrap data for small sample size problems,” in 2019 16th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), IEEE, 2019, pp. 814–817.
[19] M. Choubey and P. Bora, “Emerging role of adiponectin/AdipoRs signaling in choroidal neovascularization, age-related macular degeneration, and diabetic retinopathy,” Biomolecules, vol. 13, no. 6, p. 982, 2023.
[20] Q. Dong, Y. Li, J. Chen, and N. Wang, “Azilsartan suppressed LPS-induced inflammation in U937 macrophages through suppressing oxidative stress and inhibiting the TLR2/MyD88 signal pathway,” ACS Omega, vol. 6, no. 1, pp. 113–118, 2020.
[21] J. M. Kernbach and V. E. Staartjes, “Foundations of machine learning-based clinical prediction modeling: Part II—Generalization and overfitting,” Machine Learning in Clinical Neuroscience: Foundations and Applications, pp. 15–21, 2022.
[22] U. S. Bhutamapuram and R. Sadam, “With-in-project defect prediction using bootstrap aggregation based diverse ensemble learning technique,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 10, pp. 8675–8691, 2022.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Fergie Joanda Kaunang, Bhustomy Hakim, Fedelis Fraderic, Sherren Hartono, Andrew Kristanto Mulyanto

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).