Comparative Analysis of CNN, ResNet50, and Vision Transformer Architectures for Brain Tumor Classification from MRI Images
DOI:
https://doi.org/10.30871/jaic.v10i3.12699Keywords:
Medical image classification, Brain MRI, CNN, ResNet50, Vision Transformer, Deep Learning, Brain tumorsAbstract
The classification of brain tumors from Magnetic Resonance Imaging (MRI) is a crucial task in computer-aided medical diagnosis. Recent advances in deep learning have significantly improved performance in this domain.
In this work, a comparative analysis of three architectures is conducted: a Convolutional Neural Network (CNN) trained from scratch, a transfer learning-based model using ResNet50, and a Vision Transformer (ViT). The models are evaluated on a multi-class dataset containing four categories: glioma, meningioma, pituitary tumor, and no tumor.
Experimental results show that the CNN achieves limited performance with moderate generalization capability. The ResNet50 model reaches high accuracy during training but suffers from severe overfitting, leading to a significant drop in performance on the test set. In contrast, the Vision Transformer achieves the best overall performance, with a test accuracy of 0.76 and a good balance between precision and recall.
These results highlight the effectiveness of Transformer-based architectures for complex medical image classification tasks.
Downloads
References
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Advances in Neural Information Processing Systems (NeurIPS), vol. 25, 2012, pp. 1097–1105.
[2] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
[3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention Is All You Need," in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008.
[4] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale," in Proc. International Conference on Learning Representations (ICLR), 2021.
[5] J. Cheng, W. Huang, S. Cao, R. Yang, W. Yang, Z. Yun, Z. Wang, and Q. Feng, "Enhanced Performance of Brain Tumor Classification via Tumor Region Augmentation and Partition," PLOS ONE, vol. 10, no. 10, 2015.
[6] M. M. Badža and M. Č. Barjaktarović, "Classification of Brain Tumors from MRI Images Using a Convolutional Neural Network," Applied Sciences, vol. 10, no. 6, p. 1999, 2020.
[7] S. Khan, N. Islam, Z. Jan, I. U. Din, and J. J. P. C. Rodrigues, "A Novel Deep Learning Based Framework for the Detection and Classification of Breast Cancer Using Transfer Learning," Pattern Recognition Letters, vol. 125, pp. 1–6, 2019.
[8] S. Deepak and P. M. Ameer, "Brain Tumor Classification Using Deep CNN Features via Transfer Learning," Computers in Biology and Medicine, vol. 111, p. 103345, 2019.
[9] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, "Training Data-Efficient Image Transformers and Distillation through Attention," in Proc. International Conference on Machine Learning (ICML), 2021, pp. 10347–10357.
[10] A. Hatamizadeh, Y. Tang, V. Nath, D. Yang, A. Myronenko, B. Landman, H. R. Roth, and D. Xu, "UNETR: Transformers for 3D Medical Image Segmentation," in Proc. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 574–584.
[11] F. Oshasha, M. Kayembe et al., "EDCST-Rain: Enhanced Density-Aware Cross-Scale Transformer for Robust Object Classification Under Diverse Rainfall Conditions," 2026.
[12] R. Azad, A. Kazerouni, M. Heidari, E. K. Aghdam, A. Molaei, Y. Jia, A. Jose, R. Roy, and D. Merhof, "Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review," Medical Image Analysis, vol. 91, p. 103024, 2024.
[13] N. Takahashi, Y. Sugimoto, and T. Nakamura, "Comparison of Vision Transformers and Convolutional Neural Networks for Medical Image Classification," Journal of Medical Systems, vol. 48, no. 1, 2024.
[14] J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, and Y. Zhou, "TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation," arXiv preprint arXiv:2102.04306, 2021.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Matthieu Kayembe, Franklin Mwamba, Pierre Kafunda, Fiston Oshasha, John Poma

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








