Comparative Analysis of CNN, ResNet50, and Vision Transformer Architectures for Brain Tumor Classification from MRI Images

Authors

  • Matthieu Kayembe Université de Kinshasa
  • Franklin Mwamba Institut de Recherche en Sciences de la Santé
  • Pierre Kafunda Université de Kinshasa
  • Fiston Oshasha General Commissariat for Atomic Energy, Regional Center for Nuclear Studies of Kinshasa
  • John Poma Université de Kinshasa

DOI:

https://doi.org/10.30871/jaic.v10i3.12699

Keywords:

Medical image classification, Brain MRI, CNN, ResNet50, Vision Transformer, Deep Learning, Brain tumors

Abstract

The classification of brain tumors from Magnetic Resonance Imaging (MRI) is a crucial task in computer-aided medical diagnosis. Recent advances in deep learning have significantly improved performance in this domain.

In this work, a comparative analysis of three architectures is conducted: a Convolutional Neural Network (CNN) trained from scratch, a transfer learning-based model using ResNet50, and a Vision Transformer (ViT). The models are evaluated on a multi-class dataset containing four categories: glioma, meningioma, pituitary tumor, and no tumor.

Experimental results show that the CNN achieves limited performance with moderate generalization capability. The ResNet50 model reaches high accuracy during training but suffers from severe overfitting, leading to a significant drop in performance on the test set. In contrast, the Vision Transformer achieves the best overall performance, with a test accuracy of 0.76 and a good balance between precision and recall.

These results highlight the effectiveness of Transformer-based architectures for complex medical image classification tasks.

Downloads

Download data is not yet available.

References

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Advances in Neural Information Processing Systems (NeurIPS), vol. 25, 2012, pp. 1097–1105.

[2] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.

[3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention Is All You Need," in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008.

[4] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale," in Proc. International Conference on Learning Representations (ICLR), 2021.

[5] J. Cheng, W. Huang, S. Cao, R. Yang, W. Yang, Z. Yun, Z. Wang, and Q. Feng, "Enhanced Performance of Brain Tumor Classification via Tumor Region Augmentation and Partition," PLOS ONE, vol. 10, no. 10, 2015.

[6] M. M. Badža and M. Č. Barjaktarović, "Classification of Brain Tumors from MRI Images Using a Convolutional Neural Network," Applied Sciences, vol. 10, no. 6, p. 1999, 2020.

[7] S. Khan, N. Islam, Z. Jan, I. U. Din, and J. J. P. C. Rodrigues, "A Novel Deep Learning Based Framework for the Detection and Classification of Breast Cancer Using Transfer Learning," Pattern Recognition Letters, vol. 125, pp. 1–6, 2019.

[8] S. Deepak and P. M. Ameer, "Brain Tumor Classification Using Deep CNN Features via Transfer Learning," Computers in Biology and Medicine, vol. 111, p. 103345, 2019.

[9] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, "Training Data-Efficient Image Transformers and Distillation through Attention," in Proc. International Conference on Machine Learning (ICML), 2021, pp. 10347–10357.

[10] A. Hatamizadeh, Y. Tang, V. Nath, D. Yang, A. Myronenko, B. Landman, H. R. Roth, and D. Xu, "UNETR: Transformers for 3D Medical Image Segmentation," in Proc. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 574–584.

[11] F. Oshasha, M. Kayembe et al., "EDCST-Rain: Enhanced Density-Aware Cross-Scale Transformer for Robust Object Classification Under Diverse Rainfall Conditions," 2026.

[12] R. Azad, A. Kazerouni, M. Heidari, E. K. Aghdam, A. Molaei, Y. Jia, A. Jose, R. Roy, and D. Merhof, "Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review," Medical Image Analysis, vol. 91, p. 103024, 2024.

[13] N. Takahashi, Y. Sugimoto, and T. Nakamura, "Comparison of Vision Transformers and Convolutional Neural Networks for Medical Image Classification," Journal of Medical Systems, vol. 48, no. 1, 2024.

[14] J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, and Y. Zhou, "TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation," arXiv preprint arXiv:2102.04306, 2021.

Downloads

Published

2026-06-10

How to Cite

[1]
M. Kayembe, F. Mwamba, P. Kafunda, F. Oshasha, and J. Poma, “Comparative Analysis of CNN, ResNet50, and Vision Transformer Architectures for Brain Tumor Classification from MRI Images”, JAIC, vol. 10, no. 3, pp. 2327–2335, Jun. 2026.