Enhancing Interpretable Multiclass Lung Cancer Severity Classification using TabNet
DOI:
https://doi.org/10.30871/jaic.v9i6.11417Keywords:
Cross Validation, Deep Learning, Lung Cancer, TabNet, Tabular DataAbstract
Lung cancer poses a significant global mortality challenge, with early clinical detection hindered by non-specific symptoms making accurate diagnosis dependent on extracting subtle patterns from often complex medical tabular data. Traditional machine learning approaches often fall short in capturing intricate patterns within such heterogeneous datasets, hindering effective clinical decision support. This research introduces TabNet, an interpretable deep learning architecture, for multiclass lung cancer severity prediction (low, medium, high). Utilizing the Kaggle Lung Cancer dataset, our methodology leverages TabNet's unique attention-based feature selection for end-to-end processing of tabular data, enabling adaptive identification of key predictors and crucial model interpretability. To effectively assess its predictive capabilities and ensure robust performance, the model was trained with default configurations and validated through stratified 5-fold cross-validation, achieving outstanding performance on the test set: 98.50% accuracy, a 0.98 F1-score, and a 0.9996 macro-AUC-ROC. Beyond its robustness, confirmed by stable learning curves, interpretability analysis highlighted 'Genetic Risk' and 'Shortness of Breath' as dominant factors. Our results underscore TabNet's efficacy as a reliable, robust, and inherently interpretable solution, offering significant potential to improve the precision and transparency of lung cancer severity assessment in clinical practice.
Downloads
References
[1] M. A. Rahman Wahid, A. Nugroho, and A. Halim Anshor, “Prediksi Penyakit Kanker Paru-Paru Dengan Algoritma Regresi Linier,” Bull. Inf. Technol. BIT, vol. 4, no. 1, pp. 63–74, Mar. 2023, doi: 10.47065/bit.v4i1.501.
[2] Z. G. Syafira, C. A. Sari, I. U. W. Mulyono, F. Agustina, S. Suprayogi, and M. Doheir, “Regionprops Segmentation in Convolutional Neural Network for Identification of Lung Cancer Disease and Position,” J. Masy. Inform., vol. 16, no. 2, pp. 199–215, Jul. 2025, doi: 10.14710/jmasif.16.2.73967.
[3] I. Buana and D. A. Harahap, “Asbestos, Radon dan Polusi Udara sebagai Faktor Resiko Kanker Paru pada Perempuan Bukan Perokok,” AVERROUS J. Kedokt. Dan Kesehat. Malikussaleh, vol. 8, no. 1, pp. 1–16, Jul. 2022, doi: 10.29103/averrous.v8i1.7088.
[4] D. Benaya, “Implementasi Random Forest dalam Klasifikasi Kanker Paru-Paru,” JOINTER J. Inform. Eng., vol. 5, no. 01, pp. 27–31, Jun. 2024, doi: 10.53682/jointer.v5i01.331.
[5] World Health Organization, "Cancer," 2025. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cancer
[6] R. Patra, “Prediction of Lung Cancer Using Machine Learning Classifier,” in Computing Science, Communication and Security, vol. 1235, N. Chaubey, S. Parikh, and K. Amin, Eds., in Communications in Computer and Information Science, vol. 1235. , Singapore: Springer Singapore, 2020, pp. 132–142. doi: 10.1007/978-981-15-6648-6_11.
[7] Ferlay, M. Colombet, I. Soerjomataram, D. M. Parkin, M. Piñeros, A. Znaor, and F. Bray, “900 World fact sheet,” International Agency for Research on Cancer, Feb. 8, 2024. [Online]. Available: https://gco.iarc.who.int/media/globocan/factsheets/populations/900-world-fact-sheet.pdf
[8] Ferlay, M. Colombet, I. Soerjomataram, D. M. Parkin, M. Piñeros, A. Znaor, and F. Bray, “360 Indonesia fact sheet,” International Agency for Research on Cancer, Feb. 8, 2024. [Online]. Available: https://gco.iarc.who.int/media/globocan/factsheets/populations/360-indonesia-fact-sheet.pdf
[9] World Health Organization, "Lung Cancer," 2023. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/lung-cancer
[10] D. Yadav, “Lung Cancer Prediction using Supervised ML Algorithms,” Int. Res. J. Mod. Eng. Technol. Sci., Oct. 2022, doi: 10.56726/IRJMETS30472.
[11] R. B. Sinaga, D. Widiyanto, and B. T. Wahyono, “Deteksi Dini Penyakit Kanker Paru dengan Gabungan Algoritma Adaboost dan Random Forest,” SENAMIKA (Seminar Nasional Mahasiswa Ilmu Komputer dan Aplikasinya), vol. 3, no. 2, pp. 855-864, 2022 [Online]. Available: https://ejournal.undip.ac.id/index.php/senamika
[12] D. Septhya et al., “Implementasi Algoritma Decision Tree dan Support Vector Machine untuk Klasifikasi Penyakit Kanker Paru: Implementation of Decision Tree Algorithm and Support Vector Machine for Lung Cancer Classification,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 3, no. 1, pp. 15–19, May 2023, doi: 10.57152/malcom.v3i1.591.
[13] R. D. Marzuq, S. A. Wicaksono, and N. Y. Setiawan, “Prediksi Kanker Paru-Paru menggunakan Algoritme Random Forest Decision Tree,” Jurnala Pengembangan Teknolologi Informasi Ilmu Komputer, vol. 7, no. 7, pp.3448-3456 [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/12964
[14] M. N. H. Chowdhury et al., “Deep learning for early detection of chronic kidney disease stages in diabetes patients: A TabNet approach,” Artif. Intell. Med., vol. 166, p. 103153, Aug. 2025, doi: 10.1016/j.artmed.2025.103153.
[15] B. Regmi and C. Shah, “Classification Methods Based on Machine Learning for the Analysis of Fetal Health Data,” Nov. 18, 2023, arXiv: arXiv:2311.10962. doi: 10.48550/arXiv.2311.10962.
[16] A. U. Mazlan et al., “A Review on Recent Progress in Machine Learning and Deep Learning Methods for Cancer Classification on Gene Expression Data,” Processes, vol. 9, no. 8, p. 1466, Aug. 2021, doi: 10.3390/pr9081466.
[17] A. Oktaviana, D. P. Wijaya, A. Pramuntadi, and D. Heksaputra, “Prediksi Penyakit Diabetes Melitus Tipe 2 Menggunakan Algoritma K-Nearest Neighbor (K-NN): Prediction of Type 2 Diabetes Mellitus Using The K-Nearest Neighbor (K-NN) Algorithm,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 3, pp. 812–818, May 2024, doi: 10.57152/malcom.v4i3.1268.
[18] F. Akbar, H. W. Saputra, A. K. Maulaya, M. F. Hidayat, and R. Rahmaddeni, “Implementasi Algoritma Decision Tree C4.5 dan Support Vector Regression untuk Prediksi Penyakit Stroke: Implementation of Decision Tree Algorithm C4.5 and Support Vector Regression for Stroke Disease Prediction,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 2, no. 2, pp. 61–67, Sep. 2022, doi: 10.57152/malcom.v2i2.426.
[19] T. Guo, S. Guo, J. Zhang, W. Xu, and J. Wang, “Efficient Attribute Unlearning: Towards Selective Removal of Input Attributes from Feature Representations,” Apr. 16, 2022, arXiv: arXiv:2202.13295. doi: 10.48550/arXiv.2202.13295.
[20] S. Borrohou, R. Fissoune, and H. Badir, “Critical Role of Data Transformation in Preprocessing: Methods, Algorithms, and Challenges,” in Model and Data Engineering, vol. 15590, C. Ordonez, G. Sperlì, E. Masciari, and L. Bellatreche, Eds., in Lecture Notes in Computer Science, vol. 15590. , Cham: Springer Nature Switzerland, 2025, pp. 108–122. doi: 10.1007/978-3-031-87719-3_9.
[21] S. Ö. Arik and T. Pfister, “TabNet: Attentive Interpretable Tabular Learning,” Proc. AAAI Conf. Artif. Intell., vol. 35, no. 8, pp. 6679–6687, May 2021, doi: 10.1609/aaai.v35i8.16826.
[22] S. Aftab and S. Akhtar, “Diabetic Retinopathy Severity Classification Using Data Fusion and Ensemble Transfer Learning,” J. Softw. Eng. Appl., vol. 18, no. 01, pp. 1–23, 2025, doi: 10.4236/jsea.2025.181001.
[23] E. Ismanto, A. Fadlil, and A. Yudhana, “Analisis Perbandingan Model Fully Connected Neural Networks (FCNN) dan TabNet Untuk Klasifikasi Perawatan Pasien Pada Data Tabular,” vol. 5, no. 3, pp. 526-532, 2024, doi: 10.37859/coscitech.v5i3.8256.
[24] K. McDonnell, F. Murphy, B. Sheehan, L. Masello, and G. Castignani, “Deep learning in insurance: Accuracy and model interpretability using TabNet,” Expert Syst. Appl., vol. 217, p. 119543, May 2023, doi: 10.1016/j.eswa.2023.119543.
[25] D. Berrar, “Cross-Validation,” in Encyclopedia of Bioinformatics and Computational Biology, Elsevier, 2021, pp. 542–545. doi: 10.1016/B978-0-12-809633-8.20349-X.
[26] S. Prusty, S. Patnaik, and S. K. Dash, “SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer,” Front. Nanotechnol., vol. 4, p. 972421, Aug. 2022, doi: 10.3389/fnano.2022.972421.
[27] Ž. Ð. Vujovic, “Classification Model Evaluation Metrics,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 6, 2021, doi: 10.14569/IJACSA.2021.0120670
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Maria Bernadette Chayeenee Norman, Ika Novita Dewi, Abu Salam, Danang Wahyu Utomo, Sindhu Rakasiwi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








