Analysis of Naive Bayes Algorithm for Lung Cancer Risk Prediction Based on Lifestyle Factors
DOI:
https://doi.org/10.30871/jaic.v9i6.11463Keywords:
Lung Cancer, Lifestyle, Gaussian Naive Bayes, SMOTE, Model Mutual InformationAbstract
Lung cancer is one of the types of cancer with the highest mortality rate in the world, which is often difficult to detect in the early stages due to minimal symptoms. This study aims to build a lung cancer risk prediction model based on lifestyle factors using the Gaussian Naive Bayes algorithm. Data fit is addressed using the Synthetic Minority Over-sampling Technique (SMOTE), and feature selection is carried out using the Mutual Information. The dataset used consists of 1000 patient data with 24 features related to lifestyle and environmental factors. Model validation is carried out using 5-fold Stratified Cross Validation, and evaluated based on accuracy, precision, recall, and confusion matrices. The results show that the application of SMOTE successfully increases the model accuracy to 91.00% with high precision and recall values in all risk classes (Low, Medium, High). The features "Passive Smoker" and "Coughing up Blood" are identified as the most influential factors in the prediction. The results of this study indicate that the combination of Gaussian Naive Bayes with SMOTE and Mutual Information is able to produce an accurate prediction model.
Downloads
References
[1] R. D. Marzuq, S. A. Wicaksono, and N. Y. Setiawan, “Prediksi Kanker Paru-Paru menggunakan Algoritme Random Forest Decision Tree,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 7, pp. 3448–3456, 2023.
[2] D. Anugrah Pratama, I. Rizal Mutaqin, and K. Rafael Manuela, “Analisis Terjadinya Kanker Paru-Paru Pada Pasien Menggunakan Decision Tree: Penerapan Algoritma C4.5 Dan RapidMiner Untuk Menentukan Risiko Kanker Pada Gejala Pasien,” Jtmei, vol. 2, no. 4, pp. 156–170, 2023, [Online]. Available: https://doi.org/10.55606/jtmei.v2i4.3004
[3] H. Widya, N. Surya Putra, V. Atina, and J. Maulindar, “Penerapan Algoritme Decision Tree Pada Klasifikasi Penyakit Kanker Paru-Paru,” J. Ilm. Tek. Inform. dan Sist. Inf., 2023, [Online]. Available: https://www.kaggle.com/datasets/mysarahmadbhat/lung-cancer,
[4] J. Ferlay et al., “Cancer statistics for the year 2020: An overview,” Int. J. Cancer, vol. 149, no. 4, pp. 778–789, 2021, doi: 10.1002/ijc.33588.
[5] S. Zhang et al., “Predicting the risk of lung cancer using machine learning: A large study based on UK Biobank,” Med. (United States), vol. 103, no. 16, p. E37879, 2024, doi: 10.1097/MD.0000000000037879.
[6] N. Publikasi, E. Faizal, I. Stimik, and E. Rahma, “Penerapan Sistem Pakar Untuk Mendiagnosa Penyakit Kanker Pada Wanita Dengan Metode Certainty Factor,” vol. 8, no. 6, pp. 1–22, 2015.
[7] Y. Sinjanka, V. Kaur, U. I. Musa, and K. Kaur, “ML-based early detection of lung cancer: an integrated and in-depth analytical framework,” Discov. Artif. Intell., vol. 4, no. 1, 2024, doi: 10.1007/s44163-024-00204-6.
[8] L. L. Laily, S. Martini, K. D. Artanti, and S. Widati, “Risk factors of lung adenocarcinoma in patients at dr. soetomo district general hospital surabaya in 2018,” no. July 2019, pp. 295–303, 2020, doi: 10.20473/ijph.vl15il.2020.295-303.
[9] N. Sutandyo and E. Suratman, “Non-Small Cell Lung Carcinoma in Women: A Retrospective Cohort Study in Indonesia,” Acta Med. Indones., vol. 50, no. 4, pp. 291–298, 2018.
[10] M. I. Fajri and L. Anifah, “Deteksi Status Kanker Paru-Paru Pada Citra Ct Scan Menggunakan Metode Fuzzy Logic,” Tek. Elektro, vol. 7 no. 3, pp. 121–126, 2018.
[11] Q. An, S. Rahman, J. Zhou, and J. J. Kang, “A Comprehensive Review on Machine Learning in Healthcare Industry: Classification, Restrictions, Opportunities and Challenges,” Sensors, vol. 23, no. 9, 2023, doi: 10.3390/s23094178.
[12] B. Dutta, “Comparative Analysis of Machine Learning and Deep Learning Models for Lung Cancer Prediction Based on Symptomatic and Lifestyle Features,” Appl. Sci., vol. 15, no. 8, 2025, doi: 10.3390/app15084507.
[13] Dewi Widyawati and Amaliah Faradibah, “Comparison Analysis of Classification Model Performance in Lung Cancer Prediction Using Decision Tree, Naive Bayes, and Support Vector Machine,” Indones. J. Data Sci., vol. 4, no. 2, pp. 80–89, 2023, doi: 10.56705/ijodas.v4i2.76.
[14] B. Shafa, H. H. Handayani, S. Arum, and P. Lestari, “Prediksi Kanker Paru dengan Normalisasi menggunakan Perbandingan Algoritma Random Forest , Decision Tree dan Naïve Bayes,” vol. 4, no. 3, pp. 1057–1070, 2024.
[15] S. A. Karunia, R. Saptono, and R. Anggrainingsih, “Online News Classification Using Naive Bayes Classifier with Mutual Information for Feature Selection,” J. Ilm. Teknol. dan Inf., vol. 6, no. 1, pp. 11–15, 2017.
[16] Kemenkes RI, “Panduan Penatalaksanaan Kanker Paru,” Kom. Penanggulangan Kanker Nas., pp. 1–47, 2015, [Online]. Available: http://kanker.kemkes.go.id/guidelines/PPKProstat.pdf
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Sheila Anggun Vabilla, Majid Rahardi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








