Vision Transformer for Pneumonia Classification with Grad-CAM Explainability

Immanuel Julius Darmawan; Catur Supriyanto

doi:10.30871/jaic.v9i6.11532

Authors

Immanuel Julius Darmawan Universitas Dian Nuswantoro
Catur Supriyanto Universitas Dian Nuswantoro

DOI:

https://doi.org/10.30871/jaic.v9i6.11532

Keywords:

Chest X-Ray, Grad-CAM, Pneumonia Classification, Vision Transformer

Abstract

Pneumonia is still one of the main causes of death around the world, especially in kids and older people. To lower the death rate, early and accurate diagnosis is very important. Chest X-ray (CXR) imaging is widely used for this purpose, but manual reading of CXR images can be time-consuming and may lead to differences in interpretation between observers. To address this problem, this study presents a pneumonia classification model based on the Vision Transformer (ViT) architecture combined with Gradient-weighted Class Activation Mapping (Grad-CAM) to make the model’s decisions more interpretable. The model was trained on a publicly available CXR dataset with 5,863 images that were split into Normal and Pneumonia classes, using a 70:15:15 split for training, validation, and testing. The ViT model achieves an accuracy of 96.41% on the test set and a high recall for pneumonia cases, while class weighted loss helps to maintain more balanced predictions between the two classes. The Area Under the Curve (AUC) of 0.975 indicates strong discrimination between pneumonia-positive and normal samples. Grad-CAM visualizations, supported by a randomization test and occlusion analysis, provide an initial qualitative view of the lung regions that influence the model’s predictions and often overlap with radiologically plausible areas. However, the heatmaps have not been formally evaluated by radiologists, and the correspondence between highlighted regions and pneumonia consolidation patterns has not yet been quantitatively validated. Therefore, the proposed ViT Grad-CAM framework should be regarded as an exploratory step toward explainable pneumonia classification on chest X-rays rather than a system that is ready for clinical deployment.

Downloads

Download data is not yet available.

References

[1] S. Safiri et al., “Global burden of lower respiratory infections during the last three decades,” Jan. 2023. doi: https://doi.org/10.3389/fpubh.2022.1028525.

[2] F. Khan et al., “AI-assisted detection for chest X-rays (AID-CXR): a multi-reader multi-case study protocol,” BMJ Open, vol. 14, no. 12, Dec. 2024, doi: 10.1136/bmjopen-2023-080554.

[3] J. Becker et al., “Artificial Intelligence-Based Detection of Pneumonia in Chest Radiographs,” Diagnostics, vol. 12, no. 6, Jun. 2022, doi: 10.3390/diagnostics12061465.

[4] C. T. Yen and C. Y. Tsao, “Lightweight convolutional neural network for chest X-ray images classification,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-80826-z.

[5] A. Manickam, J. Jiang, Y. Zhou, A. Sagar, R. Soundrapandiyan, and R. Dinesh Jackson Samuel, “Automated pneumonia detection on chest X-ray images: A deep learning approach with different optimizers and transfer learning architectures,” Measurement (Lond), vol. 184, Nov. 2021, doi: 10.1016/j.measurement.2021.109953.

[6] M. F. F. Mardianto, A. Yoani, S. Soewignjo, I. K. P. K. A. Putra, and D. A. Dewi, “Classification of Pneumonia from Chest X-ray images using Support Vector Machine and Convolutional Neural Network,” 2024. [Online]. Available: www.ijacsa.thesai.org

[7] C. Usman, S. U. Rehman, A. Ali, A. M. Khan, and B. Ahmad, “Pneumonia Disease Detection Using Chest X-Rays and Machine Learning,” Algorithms, vol. 18, no. 2, Feb. 2025, doi: 10.3390/a18020082.

[8] D. Lestari, A. Mulya, A. Tatamara, R. R. Haiban, and H. D. Khalifah, “Deep Learning for Pneumonia Detection in Chest X-Rays using Different Algorithms and Transfer Learning Architectures,” Public Research Journal of Engineering, Data Technology and Computer Science, vol. 3, no. 1, pp. 1–9, Jul. 2025, doi: 10.57152/predatecs.v3i1.1656.

[9] I. Y. Chang and T. Y. Huang, “Deep learning-based classification for lung opacities in chest x-ray radiographs through batch control and sensitivity regulation,” Sci Rep, vol. 12, no. 1, Dec. 2022, doi: 10.1038/s41598-022-22506-4.

[10] O. N. Manzari, H. Ahmadabadi, H. Kashiani, S. B. Shokouhi, and A. Ayatollahi, “MedViT: A Robust Vision Transformer for Generalized Medical Image Classification,” Feb. 2023, doi: 10.1016/j.compbiomed.2023.106791.

[11] K. Tyagi, G. Pathak, R. Nijhawan, and A. Mittal, “Detecting Pneumonia using Vision Transformer and comparing with other techniques,” in 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA), IEEE, Dec. 2021, pp. 12–16. doi: 10.1109/ICECA52323.2021.9676146.

[12] S. Singh, M. Kumar, A. Kumar, B. K. Verma, K. Abhishek, and S. Selvarajan, “Efficient pneumonia detection using Vision Transformers on chest X-rays,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-52703-2.

[13] J. Ko, S. Park, and H. G. Woo, “Optimization of vision transformer-based detection of lung diseases from chest X-ray images,” BMC Med Inform Decis Mak, vol. 24, no. 1, Dec. 2024, doi: 10.1186/s12911-024-02591-3.

[14] A. Alqutayfi et al., “Explainable Disease Classification: Exploring Grad-CAM Analysis of CNNs and ViTs,” Journal of Advances in Information Technology, vol. 16, no. 2, pp. 264–273, 2025, doi: 10.12720/jait.16.2.264-273.

[15] S. Suara, A. Jha, P. Sinha, and A. A. Sekh, “Is Grad-CAM Explainable in Medical Images?,” Jul. 2023, doi: 10.1007/978-3-031-58181-6_11.

[16] P. Purwono, A. Nabila, E. Wulandari, and K. Nisa, “Explainable Artificial Intelligence (XAI) in Medical Imaging: Techniques, Applications, Challenges, and Future Directions,” Review, vol. 1, no. 1, pp. 52–66, Jun. 2025, doi: 10.53623/amms.v1i1.692.

[17] H. Chen, C. Gomez, C. M. Huang, and M. Unberath, “Explainable medical imaging AI needs human-centered design: guidelines and evidence from a systematic review,” Dec. 01, 2022, Nature Research. doi: 10.1038/s41746-022-00699-2.

[18] D. Kermany, K. Zhang, and M. Goldbaum, “Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification,” 2018, doi: 10.17632/rscbjbr9sj.2.

[19] A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Jun. 2021, [Online]. Available: http://arxiv.org/abs/2010.11929

[20] O. N. Manzari, H. Ahmadabadi, H. Kashiani, S. B. Shokouhi, and A. Ayatollahi, “MedViT: A robust vision transformer for generalized medical image classification,” Comput Biol Med, vol. 157, p. 106791, May 2023, doi: 10.1016/j.compbiomed.2023.106791.

[21] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization,” Int J Comput Vis, vol. 128, no. 2, pp. 336–359, Feb. 2020, doi: 10.1007/s11263-019-01228-7.

[22] Y. Yang, G. Mei, and F. Piccialli, “A Deep Learning Approach Considering Image Background for Pneumonia Identification Using Explainable AI (XAI),” IEEE/ACM Trans Comput Biol Bioinform, vol. 21, no. 4, pp. 857–868, Jul. 2024, doi: 10.1109/TCBB.2022.3190265.

[23] E. Chamseddine, N. Mansouri, M. Soui, and M. Abed, “Handling class imbalance in COVID-19 chest X-ray images classification: Using SMOTE and weighted loss,” Appl Soft Comput, vol. 129, Nov. 2022, doi: 10.1016/j.asoc.2022.109588.

[24] D. Park, “A Comprehensive Review of Performance Metrics for Computer-Aided Detection Systems,” Nov. 01, 2024, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/bioengineering11111165.

Vision Transformer for Pneumonia Classification with Grad-CAM Explainability

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

submit

tools

issn