Breast Cancer Detection using Decision Tree and Random Forest

Fergie Joanda Kaunang; Bhustomy Hakim; Fedelis Fraderic; Sherren Hartono; Andrew Kristanto Mulyanto

doi:10.30871/jaic.v9i2.9073

Authors

Fergie Joanda Kaunang Informatics, Universitas Bunda Mulia
Bhustomy Hakim Information System, Universitas Bunda Mulia
Fedelis Fraderic Informatics, Universitas Bunda Mulia
Sherren Hartono Informatics, Universitas Bunda Mulia
Andrew Kristanto Mulyanto Informatics, Universitas Bunda Mulia

DOI:

https://doi.org/10.30871/jaic.v9i2.9073

Keywords:

Breast Cancer, Decision Tree, Random Forest, Machine Learning, Bootstrap Aggregating

Abstract

Cancer is one of the most challenging diseases to cure and is a chronic condition that contributes significantly to global mortality. With advancements in artificial intelligence (AI) technology, AI-integrated systems can provide quick and accurate diagnoses based on collected medical data. By leveraging machine learning techniques, this study aims to compare the performance of two models using the Decision Tree (DT) and Random Forest (RF) algorithms on routine blood test data. The research process involves data preprocessing techniques such as handling missing values, detecting outliers, and feature selection, followed by applying the bootstrap aggregating technique to enhance model performance. Feature selection is used to identify the most significant features in the data that contribute to cancer detection. Using the KBest feature selection technique, the study found that the features age, BMI, leptin, adiponectin, and MCP-1 had the highest correlation with the target variable. The resulting models were evaluated to compare the performance of each algorithm. The evaluation results showed that the RF algorithm outperformed DT, achieving an accuracy of 89.65% on the processed dataset using the bootstrap technique, compared to DT's accuracy of 80.17%. Additionally, the RF algorithm demonstrated superior metric values, including a precision of 91.66% and an F1-score of 87.12%. This study concludes that the RF algorithm is more effective than DT for detecting cancer in limited datasets, especially when used with the bootstrap technique. The findings are expected to support the development of decision support systems in healthcare services for more accurate early cancer detection.

Downloads

Download data is not yet available.

References

[1] Ferlay J et al., “Global Cancer Observatory: Cancer Today,” Lyon, France: International Agency for Research on Cancer. Accessed: Jun. 01, 2024. [Online]. Available: https://gco.iarc.who.int/today

[2] National Cancer Institute, “What Is Cancer?,” National Cancer Institute at the National Institutes of Health. Accessed: Jun. 03, 2024. [Online]. Available: https://www.cancer.gov/about-cancer/understanding/what-is-cancer

[3] K. V Shiny, A. K. Ajnabi, A. Kumar, B. K. Singh, and A. Gupta, “A Machine Learning Approach for Breast Cancer Detection using Random Forest Algorithm,” International Journal of Research in Engineering, Science and Management, vol. 7, no. 4, pp. 14–18, 2024.

[4] J. Crisostomo et al., “Hyperresistinemia and metabolic dysregulation: a risky crosstalk in obese breast cancer,” Endocrine, vol. 53, pp. 433–442, 2016.

[5] H. Sun, C. Yin, Q. Liu, F. Wang, and C. Yuan, “Clinical significance of routine blood test-associated inflammatory index in breast cancer patients,” Med Sci Monit, vol. 23, p. 5090, 2017.

[6] Y.-Y. Wang, A. C. Hung, S. Lo, and S.-S. F. Yuan, “Adipocytokines visfatin and resistin in breast cancer: Clinical relevance, biological mechanisms, and therapeutic potential,” Cancer Lett, vol. 498, pp. 229–239, 2021.

[7] S. J. Russell and P. Norvig, Artificial intelligence: a modern approach. Pearson, 2022.

[8] D. B. Rarasati and J. C. A. Putra, “Correlation between Twitter sentiment analysis with three kernels using algorithm support vector machine (SVM) governor candidate electability level,” in AIP Conference Proceedings, AIP Publishing, 2023.

[9] B. Hakim, “Analisa Sentimen Data Text Preprocessing Pada Data Mining Dengan Menggunakan Machine Learning,” Jbase-Journal of business and audit information systems, vol. 4, no. 2, 2021.

[10] A. Thenata and M. Suryadi, “Machine Learning Prediction of Anxiety Levels in the Society of Academicians During the Covid-19 Pandemic,” Jurnal Varian, vol. 6, no. 1, Nov. 2022, doi: https://doi.org/10.30812/varian.v6i1.2149.

[11] D. Sulaiman and T. Mulyana, “Web-Based Writing Learning Application of Basic Hanacaraka Using Convolutional Neural Network Method,” Ultimatics : Jurnal Teknik Informatika, vol. 15, no. 1, Jun. 2023, doi: https://doi.org/10.31937/ti.v15i1.2993.

[12] C. Kaur and U. Garg, “Artificial intelligence techniques for cancer detection in medical image processing: A review,” Mater Today Proc, vol. 81, pp. 806–809, 2023.

[13] A. B. Nassif, M. A. Talib, Q. Nasir, Y. Afadar, and O. Elgendy, “Breast cancer detection using artificial intelligence techniques: A systematic literature review,” Artif Intell Med, vol. 127, p. 102276, 2022.

[14] D. Patel, Y. Shah, N. Thakkar, K. Shah, and M. Shah, “Implementation of artificial intelligence techniques for cancer detection,” Augmented Human Research, vol. 5, pp. 1–10, 2020.

[15] M. Shehab et al., “Machine learning in medical applications: A review of state-of-the-art methods,” Comput Biol Med, vol. 145, p. 105458, 2022.

[16] K. Marias, “The constantly evolving role of medical image processing in oncology: from traditional medical image processing to imaging biomarkers and radiomics,” J Imaging, vol. 7, no. 8, p. 124, 2021.

[17] T.-H. Lee, A. Ullah, and R. Wang, “Bootstrap aggregating and random forest,” Macroeconomic forecasting in the era of big data: Theory and practice, pp. 389–429, 2020.

[18] Y. Zhao and R. Duangsoithong, “Empirical analysis using feature selection and bootstrap data for small sample size problems,” in 2019 16th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), IEEE, 2019, pp. 814–817.

[19] M. Choubey and P. Bora, “Emerging role of adiponectin/AdipoRs signaling in choroidal neovascularization, age-related macular degeneration, and diabetic retinopathy,” Biomolecules, vol. 13, no. 6, p. 982, 2023.

[20] Q. Dong, Y. Li, J. Chen, and N. Wang, “Azilsartan suppressed LPS-induced inflammation in U937 macrophages through suppressing oxidative stress and inhibiting the TLR2/MyD88 signal pathway,” ACS Omega, vol. 6, no. 1, pp. 113–118, 2020.

[21] J. M. Kernbach and V. E. Staartjes, “Foundations of machine learning-based clinical prediction modeling: Part II—Generalization and overfitting,” Machine Learning in Clinical Neuroscience: Foundations and Applications, pp. 15–21, 2022.

[22] U. S. Bhutamapuram and R. Sadam, “With-in-project defect prediction using bootstrap aggregation based diverse ensemble learning technique,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 10, pp. 8675–8691, 2022.

Breast Cancer Detection using Decision Tree and Random Forest

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

submit

tools

issn