Vision Transformer for Pneumonia Classification with Grad-CAM Explainability
DOI:
https://doi.org/10.30871/jaic.v9i6.11532Keywords:
Chest X-Ray, Grad-CAM, Pneumonia Classification, Vision TransformerAbstract
Pneumonia is still one of the main causes of death around the world, especially in kids and older people. To lower the death rate, early and accurate diagnosis is very important. Chest X-ray (CXR) imaging is widely used for this purpose, but manual reading of CXR images can be time-consuming and may lead to differences in interpretation between observers. To address this problem, this study presents a pneumonia classification model based on the Vision Transformer (ViT) architecture combined with Gradient-weighted Class Activation Mapping (Grad-CAM) to make the model’s decisions more interpretable. The model was trained on a publicly available CXR dataset with 5,863 images that were split into Normal and Pneumonia classes, using a 70:15:15 split for training, validation, and testing. The ViT model achieves an accuracy of 96.41% on the test set and a high recall for pneumonia cases, while class weighted loss helps to maintain more balanced predictions between the two classes. The Area Under the Curve (AUC) of 0.975 indicates strong discrimination between pneumonia-positive and normal samples. Grad-CAM visualizations, supported by a randomization test and occlusion analysis, provide an initial qualitative view of the lung regions that influence the model’s predictions and often overlap with radiologically plausible areas. However, the heatmaps have not been formally evaluated by radiologists, and the correspondence between highlighted regions and pneumonia consolidation patterns has not yet been quantitatively validated. Therefore, the proposed ViT Grad-CAM framework should be regarded as an exploratory step toward explainable pneumonia classification on chest X-rays rather than a system that is ready for clinical deployment.
Downloads
References
[1] S. Safiri et al., “Global burden of lower respiratory infections during the last three decades,” Jan. 2023. doi: https://doi.org/10.3389/fpubh.2022.1028525.
[2] F. Khan et al., “AI-assisted detection for chest X-rays (AID-CXR): a multi-reader multi-case study protocol,” BMJ Open, vol. 14, no. 12, Dec. 2024, doi: 10.1136/bmjopen-2023-080554.
[3] J. Becker et al., “Artificial Intelligence-Based Detection of Pneumonia in Chest Radiographs,” Diagnostics, vol. 12, no. 6, Jun. 2022, doi: 10.3390/diagnostics12061465.
[4] C. T. Yen and C. Y. Tsao, “Lightweight convolutional neural network for chest X-ray images classification,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-80826-z.
[5] A. Manickam, J. Jiang, Y. Zhou, A. Sagar, R. Soundrapandiyan, and R. Dinesh Jackson Samuel, “Automated pneumonia detection on chest X-ray images: A deep learning approach with different optimizers and transfer learning architectures,” Measurement (Lond), vol. 184, Nov. 2021, doi: 10.1016/j.measurement.2021.109953.
[6] M. F. F. Mardianto, A. Yoani, S. Soewignjo, I. K. P. K. A. Putra, and D. A. Dewi, “Classification of Pneumonia from Chest X-ray images using Support Vector Machine and Convolutional Neural Network,” 2024. [Online]. Available: www.ijacsa.thesai.org
[7] C. Usman, S. U. Rehman, A. Ali, A. M. Khan, and B. Ahmad, “Pneumonia Disease Detection Using Chest X-Rays and Machine Learning,” Algorithms, vol. 18, no. 2, Feb. 2025, doi: 10.3390/a18020082.
[8] D. Lestari, A. Mulya, A. Tatamara, R. R. Haiban, and H. D. Khalifah, “Deep Learning for Pneumonia Detection in Chest X-Rays using Different Algorithms and Transfer Learning Architectures,” Public Research Journal of Engineering, Data Technology and Computer Science, vol. 3, no. 1, pp. 1–9, Jul. 2025, doi: 10.57152/predatecs.v3i1.1656.
[9] I. Y. Chang and T. Y. Huang, “Deep learning-based classification for lung opacities in chest x-ray radiographs through batch control and sensitivity regulation,” Sci Rep, vol. 12, no. 1, Dec. 2022, doi: 10.1038/s41598-022-22506-4.
[10] O. N. Manzari, H. Ahmadabadi, H. Kashiani, S. B. Shokouhi, and A. Ayatollahi, “MedViT: A Robust Vision Transformer for Generalized Medical Image Classification,” Feb. 2023, doi: 10.1016/j.compbiomed.2023.106791.
[11] K. Tyagi, G. Pathak, R. Nijhawan, and A. Mittal, “Detecting Pneumonia using Vision Transformer and comparing with other techniques,” in 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA), IEEE, Dec. 2021, pp. 12–16. doi: 10.1109/ICECA52323.2021.9676146.
[12] S. Singh, M. Kumar, A. Kumar, B. K. Verma, K. Abhishek, and S. Selvarajan, “Efficient pneumonia detection using Vision Transformers on chest X-rays,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-52703-2.
[13] J. Ko, S. Park, and H. G. Woo, “Optimization of vision transformer-based detection of lung diseases from chest X-ray images,” BMC Med Inform Decis Mak, vol. 24, no. 1, Dec. 2024, doi: 10.1186/s12911-024-02591-3.
[14] A. Alqutayfi et al., “Explainable Disease Classification: Exploring Grad-CAM Analysis of CNNs and ViTs,” Journal of Advances in Information Technology, vol. 16, no. 2, pp. 264–273, 2025, doi: 10.12720/jait.16.2.264-273.
[15] S. Suara, A. Jha, P. Sinha, and A. A. Sekh, “Is Grad-CAM Explainable in Medical Images?,” Jul. 2023, doi: 10.1007/978-3-031-58181-6_11.
[16] P. Purwono, A. Nabila, E. Wulandari, and K. Nisa, “Explainable Artificial Intelligence (XAI) in Medical Imaging: Techniques, Applications, Challenges, and Future Directions,” Review, vol. 1, no. 1, pp. 52–66, Jun. 2025, doi: 10.53623/amms.v1i1.692.
[17] H. Chen, C. Gomez, C. M. Huang, and M. Unberath, “Explainable medical imaging AI needs human-centered design: guidelines and evidence from a systematic review,” Dec. 01, 2022, Nature Research. doi: 10.1038/s41746-022-00699-2.
[18] D. Kermany, K. Zhang, and M. Goldbaum, “Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification,” 2018, doi: 10.17632/rscbjbr9sj.2.
[19] A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Jun. 2021, [Online]. Available: http://arxiv.org/abs/2010.11929
[20] O. N. Manzari, H. Ahmadabadi, H. Kashiani, S. B. Shokouhi, and A. Ayatollahi, “MedViT: A robust vision transformer for generalized medical image classification,” Comput Biol Med, vol. 157, p. 106791, May 2023, doi: 10.1016/j.compbiomed.2023.106791.
[21] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization,” Int J Comput Vis, vol. 128, no. 2, pp. 336–359, Feb. 2020, doi: 10.1007/s11263-019-01228-7.
[22] Y. Yang, G. Mei, and F. Piccialli, “A Deep Learning Approach Considering Image Background for Pneumonia Identification Using Explainable AI (XAI),” IEEE/ACM Trans Comput Biol Bioinform, vol. 21, no. 4, pp. 857–868, Jul. 2024, doi: 10.1109/TCBB.2022.3190265.
[23] E. Chamseddine, N. Mansouri, M. Soui, and M. Abed, “Handling class imbalance in COVID-19 chest X-ray images classification: Using SMOTE and weighted loss,” Appl Soft Comput, vol. 129, Nov. 2022, doi: 10.1016/j.asoc.2022.109588.
[24] D. Park, “A Comprehensive Review of Performance Metrics for Computer-Aided Detection Systems,” Nov. 01, 2024, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/bioengineering11111165.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Immanuel Julius Darmawan, Catur Supriyanto

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








