Enhancing Interpretable Multiclass Lung Cancer Severity Classification using TabNet

Authors

  • Maria Bernadette Chayeenee Norman Universitas Dian Nuswantoro
  • Ika Novita Dewi Department of Information System, Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang
  • Abu Salam Research Center for Intelligent Distributed Surveillance and Security (IDSS), Universitas Dian Nuswantoro, Semarang
  • Danang Wahyu Utomo Research Center for Intelligent Distributed Surveillance and Security (IDSS), Universitas Dian Nuswantoro, Semarang
  • Sindhu Rakasiwi Research Center for Intelligent Distributed Surveillance and Security (IDSS), Universitas Dian Nuswantoro, Semarang

DOI:

https://doi.org/10.30871/jaic.v9i6.11417

Keywords:

Cross Validation, Deep Learning, Lung Cancer, TabNet, Tabular Data

Abstract

Lung cancer poses a significant global mortality challenge, with early clinical detection hindered by non-specific symptoms making accurate diagnosis dependent on extracting subtle patterns from often complex medical tabular data. Traditional machine learning approaches often fall short in capturing intricate patterns within such heterogeneous datasets, hindering effective clinical decision support. This research introduces TabNet, an interpretable deep learning architecture, for multiclass lung cancer severity prediction (low, medium, high). Utilizing the Kaggle Lung Cancer dataset, our methodology leverages TabNet's unique attention-based feature selection for end-to-end processing of tabular data, enabling adaptive identification of key predictors and crucial model interpretability. To effectively assess its predictive capabilities and ensure robust performance, the model was trained with default configurations and validated through stratified 5-fold cross-validation, achieving outstanding performance on the test set: 98.50% accuracy, a 0.98 F1-score, and a 0.9996 macro-AUC-ROC. Beyond its robustness, confirmed by stable learning curves, interpretability analysis highlighted 'Genetic Risk' and 'Shortness of Breath' as dominant factors. Our results underscore TabNet's efficacy as a reliable, robust, and inherently interpretable solution, offering significant potential to improve the precision and transparency of lung cancer severity assessment in clinical practice.

Downloads

Download data is not yet available.

References

[1] M. A. Rahman Wahid, A. Nugroho, and A. Halim Anshor, “Prediksi Penyakit Kanker Paru-Paru Dengan Algoritma Regresi Linier,” Bull. Inf. Technol. BIT, vol. 4, no. 1, pp. 63–74, Mar. 2023, doi: 10.47065/bit.v4i1.501.

[2] Z. G. Syafira, C. A. Sari, I. U. W. Mulyono, F. Agustina, S. Suprayogi, and M. Doheir, “Regionprops Segmentation in Convolutional Neural Network for Identification of Lung Cancer Disease and Position,” J. Masy. Inform., vol. 16, no. 2, pp. 199–215, Jul. 2025, doi: 10.14710/jmasif.16.2.73967.

[3] I. Buana and D. A. Harahap, “Asbestos, Radon dan Polusi Udara sebagai Faktor Resiko Kanker Paru pada Perempuan Bukan Perokok,” AVERROUS J. Kedokt. Dan Kesehat. Malikussaleh, vol. 8, no. 1, pp. 1–16, Jul. 2022, doi: 10.29103/averrous.v8i1.7088.

[4] D. Benaya, “Implementasi Random Forest dalam Klasifikasi Kanker Paru-Paru,” JOINTER J. Inform. Eng., vol. 5, no. 01, pp. 27–31, Jun. 2024, doi: 10.53682/jointer.v5i01.331.

[5] World Health Organization, "Cancer," 2025. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cancer

[6] R. Patra, “Prediction of Lung Cancer Using Machine Learning Classifier,” in Computing Science, Communication and Security, vol. 1235, N. Chaubey, S. Parikh, and K. Amin, Eds., in Communications in Computer and Information Science, vol. 1235. , Singapore: Springer Singapore, 2020, pp. 132–142. doi: 10.1007/978-981-15-6648-6_11.

[7] Ferlay, M. Colombet, I. Soerjomataram, D. M. Parkin, M. Piñeros, A. Znaor, and F. Bray, “900 World fact sheet,” International Agency for Research on Cancer, Feb. 8, 2024. [Online]. Available: https://gco.iarc.who.int/media/globocan/factsheets/populations/900-world-fact-sheet.pdf

[8] Ferlay, M. Colombet, I. Soerjomataram, D. M. Parkin, M. Piñeros, A. Znaor, and F. Bray, “360 Indonesia fact sheet,” International Agency for Research on Cancer, Feb. 8, 2024. [Online]. Available: https://gco.iarc.who.int/media/globocan/factsheets/populations/360-indonesia-fact-sheet.pdf

[9] World Health Organization, "Lung Cancer," 2023. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/lung-cancer

[10] D. Yadav, “Lung Cancer Prediction using Supervised ML Algorithms,” Int. Res. J. Mod. Eng. Technol. Sci., Oct. 2022, doi: 10.56726/IRJMETS30472.

[11] R. B. Sinaga, D. Widiyanto, and B. T. Wahyono, “Deteksi Dini Penyakit Kanker Paru dengan Gabungan Algoritma Adaboost dan Random Forest,” SENAMIKA (Seminar Nasional Mahasiswa Ilmu Komputer dan Aplikasinya), vol. 3, no. 2, pp. 855-864, 2022 [Online]. Available: https://ejournal.undip.ac.id/index.php/senamika

[12] D. Septhya et al., “Implementasi Algoritma Decision Tree dan Support Vector Machine untuk Klasifikasi Penyakit Kanker Paru: Implementation of Decision Tree Algorithm and Support Vector Machine for Lung Cancer Classification,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 3, no. 1, pp. 15–19, May 2023, doi: 10.57152/malcom.v3i1.591.

[13] R. D. Marzuq, S. A. Wicaksono, and N. Y. Setiawan, “Prediksi Kanker Paru-Paru menggunakan Algoritme Random Forest Decision Tree,” Jurnala Pengembangan Teknolologi Informasi Ilmu Komputer, vol. 7, no. 7, pp.3448-3456 [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/12964

[14] M. N. H. Chowdhury et al., “Deep learning for early detection of chronic kidney disease stages in diabetes patients: A TabNet approach,” Artif. Intell. Med., vol. 166, p. 103153, Aug. 2025, doi: 10.1016/j.artmed.2025.103153.

[15] B. Regmi and C. Shah, “Classification Methods Based on Machine Learning for the Analysis of Fetal Health Data,” Nov. 18, 2023, arXiv: arXiv:2311.10962. doi: 10.48550/arXiv.2311.10962.

[16] A. U. Mazlan et al., “A Review on Recent Progress in Machine Learning and Deep Learning Methods for Cancer Classification on Gene Expression Data,” Processes, vol. 9, no. 8, p. 1466, Aug. 2021, doi: 10.3390/pr9081466.

[17] A. Oktaviana, D. P. Wijaya, A. Pramuntadi, and D. Heksaputra, “Prediksi Penyakit Diabetes Melitus Tipe 2 Menggunakan Algoritma K-Nearest Neighbor (K-NN): Prediction of Type 2 Diabetes Mellitus Using The K-Nearest Neighbor (K-NN) Algorithm,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 3, pp. 812–818, May 2024, doi: 10.57152/malcom.v4i3.1268.

[18] F. Akbar, H. W. Saputra, A. K. Maulaya, M. F. Hidayat, and R. Rahmaddeni, “Implementasi Algoritma Decision Tree C4.5 dan Support Vector Regression untuk Prediksi Penyakit Stroke: Implementation of Decision Tree Algorithm C4.5 and Support Vector Regression for Stroke Disease Prediction,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 2, no. 2, pp. 61–67, Sep. 2022, doi: 10.57152/malcom.v2i2.426.

[19] T. Guo, S. Guo, J. Zhang, W. Xu, and J. Wang, “Efficient Attribute Unlearning: Towards Selective Removal of Input Attributes from Feature Representations,” Apr. 16, 2022, arXiv: arXiv:2202.13295. doi: 10.48550/arXiv.2202.13295.

[20] S. Borrohou, R. Fissoune, and H. Badir, “Critical Role of Data Transformation in Preprocessing: Methods, Algorithms, and Challenges,” in Model and Data Engineering, vol. 15590, C. Ordonez, G. Sperlì, E. Masciari, and L. Bellatreche, Eds., in Lecture Notes in Computer Science, vol. 15590. , Cham: Springer Nature Switzerland, 2025, pp. 108–122. doi: 10.1007/978-3-031-87719-3_9.

[21] S. Ö. Arik and T. Pfister, “TabNet: Attentive Interpretable Tabular Learning,” Proc. AAAI Conf. Artif. Intell., vol. 35, no. 8, pp. 6679–6687, May 2021, doi: 10.1609/aaai.v35i8.16826.

[22] S. Aftab and S. Akhtar, “Diabetic Retinopathy Severity Classification Using Data Fusion and Ensemble Transfer Learning,” J. Softw. Eng. Appl., vol. 18, no. 01, pp. 1–23, 2025, doi: 10.4236/jsea.2025.181001.

[23] E. Ismanto, A. Fadlil, and A. Yudhana, “Analisis Perbandingan Model Fully Connected Neural Networks (FCNN) dan TabNet Untuk Klasifikasi Perawatan Pasien Pada Data Tabular,” vol. 5, no. 3, pp. 526-532, 2024, doi: 10.37859/coscitech.v5i3.8256.

[24] K. McDonnell, F. Murphy, B. Sheehan, L. Masello, and G. Castignani, “Deep learning in insurance: Accuracy and model interpretability using TabNet,” Expert Syst. Appl., vol. 217, p. 119543, May 2023, doi: 10.1016/j.eswa.2023.119543.

[25] D. Berrar, “Cross-Validation,” in Encyclopedia of Bioinformatics and Computational Biology, Elsevier, 2021, pp. 542–545. doi: 10.1016/B978-0-12-809633-8.20349-X.

[26] S. Prusty, S. Patnaik, and S. K. Dash, “SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer,” Front. Nanotechnol., vol. 4, p. 972421, Aug. 2022, doi: 10.3389/fnano.2022.972421.

[27] Ž. Ð. Vujovic, “Classification Model Evaluation Metrics,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 6, 2021, doi: 10.14569/IJACSA.2021.0120670

Downloads

Published

2025-12-07

How to Cite

[1]
M. B. C. Norman, I. N. Dewi, A. Salam, D. W. Utomo, and S. Rakasiwi, “Enhancing Interpretable Multiclass Lung Cancer Severity Classification using TabNet”, JAIC, vol. 9, no. 6, pp. 3454–3463, Dec. 2025.

Most read articles by the same author(s)

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.