Analysis of Naive Bayes Algorithm for Lung Cancer Risk Prediction Based on Lifestyle Factors

Authors

  • Sheila Anggun Vabilla Universitas Amikom Yogyakarta
  • Majid Rahardi Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.30871/jaic.v9i6.11463

Keywords:

Lung Cancer, Lifestyle, Gaussian Naive Bayes, SMOTE, Model Mutual Information

Abstract

Lung cancer is one of the types of cancer with the highest mortality rate in the world, which is often difficult to detect in the early stages due to minimal symptoms. This study aims to build a lung cancer risk prediction model based on lifestyle factors using the Gaussian Naive Bayes algorithm. Data fit is addressed using the Synthetic Minority Over-sampling Technique (SMOTE), and feature selection is carried out using the Mutual Information. The dataset used consists of 1000 patient data with 24 features related to lifestyle and environmental factors. Model validation is carried out using 5-fold Stratified Cross Validation, and evaluated based on accuracy, precision, recall, and confusion matrices. The results show that the application of SMOTE successfully increases the model accuracy to 91.00% with high precision and recall values in all risk classes (Low, Medium, High). The features "Passive Smoker" and "Coughing up Blood" are identified as the most influential factors in the prediction. The results of this study indicate that the combination of Gaussian Naive Bayes with SMOTE and Mutual Information is able to produce an accurate prediction model.

Downloads

Download data is not yet available.

References

[1] R. D. Marzuq, S. A. Wicaksono, and N. Y. Setiawan, “Prediksi Kanker Paru-Paru menggunakan Algoritme Random Forest Decision Tree,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 7, pp. 3448–3456, 2023.

[2] D. Anugrah Pratama, I. Rizal Mutaqin, and K. Rafael Manuela, “Analisis Terjadinya Kanker Paru-Paru Pada Pasien Menggunakan Decision Tree: Penerapan Algoritma C4.5 Dan RapidMiner Untuk Menentukan Risiko Kanker Pada Gejala Pasien,” Jtmei, vol. 2, no. 4, pp. 156–170, 2023, [Online]. Available: https://doi.org/10.55606/jtmei.v2i4.3004

[3] H. Widya, N. Surya Putra, V. Atina, and J. Maulindar, “Penerapan Algoritme Decision Tree Pada Klasifikasi Penyakit Kanker Paru-Paru,” J. Ilm. Tek. Inform. dan Sist. Inf., 2023, [Online]. Available: https://www.kaggle.com/datasets/mysarahmadbhat/lung-cancer,

[4] J. Ferlay et al., “Cancer statistics for the year 2020: An overview,” Int. J. Cancer, vol. 149, no. 4, pp. 778–789, 2021, doi: 10.1002/ijc.33588.

[5] S. Zhang et al., “Predicting the risk of lung cancer using machine learning: A large study based on UK Biobank,” Med. (United States), vol. 103, no. 16, p. E37879, 2024, doi: 10.1097/MD.0000000000037879.

[6] N. Publikasi, E. Faizal, I. Stimik, and E. Rahma, “Penerapan Sistem Pakar Untuk Mendiagnosa Penyakit Kanker Pada Wanita Dengan Metode Certainty Factor,” vol. 8, no. 6, pp. 1–22, 2015.

[7] Y. Sinjanka, V. Kaur, U. I. Musa, and K. Kaur, “ML-based early detection of lung cancer: an integrated and in-depth analytical framework,” Discov. Artif. Intell., vol. 4, no. 1, 2024, doi: 10.1007/s44163-024-00204-6.

[8] L. L. Laily, S. Martini, K. D. Artanti, and S. Widati, “Risk factors of lung adenocarcinoma in patients at dr. soetomo district general hospital surabaya in 2018,” no. July 2019, pp. 295–303, 2020, doi: 10.20473/ijph.vl15il.2020.295-303.

[9] N. Sutandyo and E. Suratman, “Non-Small Cell Lung Carcinoma in Women: A Retrospective Cohort Study in Indonesia,” Acta Med. Indones., vol. 50, no. 4, pp. 291–298, 2018.

[10] M. I. Fajri and L. Anifah, “Deteksi Status Kanker Paru-Paru Pada Citra Ct Scan Menggunakan Metode Fuzzy Logic,” Tek. Elektro, vol. 7 no. 3, pp. 121–126, 2018.

[11] Q. An, S. Rahman, J. Zhou, and J. J. Kang, “A Comprehensive Review on Machine Learning in Healthcare Industry: Classification, Restrictions, Opportunities and Challenges,” Sensors, vol. 23, no. 9, 2023, doi: 10.3390/s23094178.

[12] B. Dutta, “Comparative Analysis of Machine Learning and Deep Learning Models for Lung Cancer Prediction Based on Symptomatic and Lifestyle Features,” Appl. Sci., vol. 15, no. 8, 2025, doi: 10.3390/app15084507.

[13] Dewi Widyawati and Amaliah Faradibah, “Comparison Analysis of Classification Model Performance in Lung Cancer Prediction Using Decision Tree, Naive Bayes, and Support Vector Machine,” Indones. J. Data Sci., vol. 4, no. 2, pp. 80–89, 2023, doi: 10.56705/ijodas.v4i2.76.

[14] B. Shafa, H. H. Handayani, S. Arum, and P. Lestari, “Prediksi Kanker Paru dengan Normalisasi menggunakan Perbandingan Algoritma Random Forest , Decision Tree dan Naïve Bayes,” vol. 4, no. 3, pp. 1057–1070, 2024.

[15] S. A. Karunia, R. Saptono, and R. Anggrainingsih, “Online News Classification Using Naive Bayes Classifier with Mutual Information for Feature Selection,” J. Ilm. Teknol. dan Inf., vol. 6, no. 1, pp. 11–15, 2017.

[16] Kemenkes RI, “Panduan Penatalaksanaan Kanker Paru,” Kom. Penanggulangan Kanker Nas., pp. 1–47, 2015, [Online]. Available: http://kanker.kemkes.go.id/guidelines/PPKProstat.pdf

Downloads

Published

2025-12-06

How to Cite

[1]
S. A. Vabilla and M. Rahardi, “Analysis of Naive Bayes Algorithm for Lung Cancer Risk Prediction Based on Lifestyle Factors”, JAIC, vol. 9, no. 6, pp. 3363–3369, Dec. 2025.

Most read articles by the same author(s)

1 2 3 > >> 

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.