Transformer-Based Deep Learning Model for Coffee Bean Classification
DOI:
https://doi.org/10.30871/jaic.v9i5.10301Keywords:
Coffee Bean Classification, Deep Learning, Transformer, Vision Transformer, Swin TransformerAbstract
Coffee is one of the most popular beverage commodities consumed worldwide. The process of selecting high-quality coffee beans plays a vital role in ensuring that the resulting coffee has superior taste and aroma. Over the years, various deep learning models based on Convolutional Neural Networks (CNN) have been developed and utilized to classify coffee bean images with impressive accuracy and performance. However, recent advancements in deep learning have introduced novel transformer-based architectures that show great promise for image classification tasks. By incorporating a self-attention module, transformer models excel at generating global context features within images. This ability demonstrate improved and more consistent performance compared to CNN-based models. This study focuses on training and evaluating transformer-based deep learning models specifically for the classification of coffee bean images. Experimental results demonstrate that transformer models, such as the Vision Transformer (ViT) and Swin Transformer, outperform traditional CNN-based models. Swin Transformer model achieves excellent on the coffee bean image classification task, with 95.13% Accuracy and 90.21% F1-Score, while ViT achieves 94.47% Accuracy and 88.93% F1-Score. It indicates their strong capability in accurately identifying and classifying different types of coffee beans. This suggests that transformer-based approaches could be a better alternative for coffee bean image classification tasks in the future.
Downloads
References
[1] Y. Yulianti, N. Andarwulan, D. R. Adawiyah, D. Herawati, and D. Indrasti, ‘Physicochemical characteristics and bioactive compound profiles of Arabica Kalosi Enrekang with different postharvest processing’, Food Sci. Technol, vol. 42, 2022, doi: 10.1590/fst.67622.
[2] D. Herawati et al., ‘Impact of bean origin and brewing methods on bioactive compounds, bioactivities, nutrition, and sensory perception in coffee brews: An Indonesian coffee gastronomy study’, International Journal of Gastronomy and Food Science, vol. 35, p. 100892, Mar. 2024, doi: 10.1016/j.ijgfs.2024.100892.
[3] R. M. Van Dam, F. B. Hu, and W. C. Willett, ‘Coffee, Caffeine, and Health’, N Engl J Med, vol. 383, no. 4, pp. 369–378, Jul. 2020, doi: 10.1056/NEJMra1816604.
[4] W. B. Sunarharum, S. S. Yuwono, N. B. S. W. Pangestu, and H. Nadhiroh, ‘Physical and sensory quality of Java Arabica green coffee beans’, IOP Conf. Ser.: Earth Environ. Sci., vol. 131, p. 012018, Mar. 2018, doi: 10.1088/1755-1315/131/1/012018.
[5] R. A. Fadri, K. Kesuma Sayuti, N. Nazir, and I. Suliansyah, ‘Evaluation of the Value of the Defective and Taste of Arabica Coffee (Coffea Arabica L) West Sumatera’, IOP Conf. Ser.: Earth Environ. Sci., vol. 819, no. 1, p. 012004, Jul. 2021, doi: 10.1088/1755-1315/819/1/012004.
[6] E. N. Koeffer, Progress in Food Chemistry. New York: Nova Science Publishers, Incorporated, 2008.
[7] A. Yilma and T. Kufa, ‘Coffee Peaberry as A Potential Seed Source for Production’, IJRSSET, vol. 7, no. 9, pp. 30–35, 2020.
[8] H. L. Gope and H. Fukai, ‘Peaberry and normal coffee bean classification using CNN, SVM, and KNN: Their implementation in and the limitations of Raspberry Pi 3’, AIMSAGRI, vol. 7, no. 1, pp. 149–167, 2022, doi: 10.3934/agrfood.2022010.
[9] C. C. Enriquez, J. Marcelo, D. R. Verula, and N. J. Casildo, ‘Leveraging deep learning for coffee bean grading: A comparative analysis of convolutional neural network models’.
[10] T. A. Heryanto and I. G. B. B. Nugraha, ‘Classification of Coffee Beans Defect Using Mask Region-based Convolutional Neural Network’, in 2022 International Conference on Information Technology Systems and Innovation (ICITSI), Bandung, Indonesia: IEEE, Nov. 2022, pp. 333–339. doi: 10.1109/ICITSI56531.2022.9970890.
[11] V. A. M. Luis, M. V. T. Quinones, and A. N. Yumang, ‘Classification of Defects in Robusta Green Coffee Beans Using YOLO’, in 2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), Kota Kinabalu, Malaysia: IEEE, Sep. 2022, pp. 1–6. doi: 10.1109/IICAIET55139.2022.9936831.
[12] A. Febriana, K. Muchtar, R. Dawood, and C.-Y. Lin, ‘USK-COFFEE Dataset: A Multi-Class Green Arabica Coffee Bean Dataset for Deep Learning’, in 2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom), Malang, Indonesia: IEEE, Jun. 2022, pp. 469–473. doi: 10.1109/CyberneticsCom55287.2022.9865489.
[13] B. R. Santoso, C. A. Sari, and E. H. Rachmawanto, ‘Coffee Beans Classification Using Convolutional Neural Networks Based On Extraction Value Analysis In Grayscale Color Space’, JAIC, vol. 9, no. 1, pp. 31–37, Jan. 2025, doi: 10.30871/jaic.v9i1.8916.
[14] Y. Jiao et al., ‘Swin-HSSAM: A green coffee bean grading method by Swin transformer’, PLoS One, vol. 20, no. 5, p. e0322198, May 2025, doi: 10.1371/journal.pone.0322198.
[15] Z. Liu et al., ‘Swin Transformer: Hierarchical Vision Transformer using Shifted Windows’, Proceedings of the IEEE International Conference on Computer Vision, pp. 9992–10002, 2021, doi: 10.1109/ICCV48922.2021.00986.
[16] F. Chollet, ‘Xception: Deep learning with depthwise separable convolutions’, Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 1800–1807, 2017, doi: 10.1109/CVPR.2017.195.
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘ImageNet Classification with Deep Convolutional Neural Networks’, in Advances in Neural Information Processing Systems 25 (NIPS 2012), 2012.
[18] K. He, X. Zhang, S. Ren, and J. Sun, ‘Deep residual learning for image recognition’, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Dec. 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90.
[19] K. Simonyan and A. Zisserman, ‘Very Deep Convolutional Networks for Large-Scale Image Recognition’, Sep. 2014, [Online]. Available: http://arxiv.org/abs/1409.1556
[20] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, ‘MobileNetV2: Inverted Residuals and Linear Bottlenecks’, Mar. 21, 2019, arXiv: arXiv:1801.04381. doi: 10.48550/arXiv.1801.04381.
[21] A. Dosovitskiy et al., ‘An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale’, International Conference on Learning Representations, Oct. 2020, [Online]. Available: http://arxiv.org/abs/2010.11929
[22] R. Karthik, R. Aswin, K. S. Geetha, and K. Suganthi, ‘An Explainable Deep Learning Network With Transformer and Custom CNN for Bean Leaf Disease Classification’, IEEE Access, vol. 13, pp. 38562–38573, 2025, doi: 10.1109/ACCESS.2025.3546017.
[23] R. Selvanarayanan, S. R, G. T, and K. L, ‘Hybrid Vision Transformer and CNN for Detection of Overripe Coffee Berry Disease (OCBD) in Coffee Plantation’, in 2024 International Conference on Emerging Research in Computational Science (ICERCS), Coimbatore, India: IEEE, Dec. 2024, pp. 1–7. doi: 10.1109/ICERCS63125.2024.10895612.
[24] A. Vaswani et al., ‘Attention Is All You Need’, 2017, arXiv: arXiv:1706.03762. Accessed: Oct. 29, 2024. [Online]. Available: http://arxiv.org/abs/1706.03762
[25] D. Hendrycks and K. Gimpel, ‘Gaussian Error Linear Units (GELUs)’, Jun. 06, 2023, arXiv: arXiv:1606.08415. doi: 10.48550/arXiv.1606.08415.
[26] I. Loshchilov and F. Hutter, ‘SGDR: Stochastic Gradient Descent with Warm Restarts’, May 03, 2017, arXiv: arXiv:1608.03983. doi: 10.48550/arXiv.1608.03983.
[27] E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, ‘RandAugment: Practical automated data augmentation with a reduced search space’, Nov. 13, 2019, arXiv: arXiv:1909.13719. Accessed: Aug. 30, 2024. [Online]. Available: http://arxiv.org/abs/1909.13719
[28] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, ‘Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization’, Int J Comput Vis, vol. 128, no. 2, pp. 336–359, Feb. 2020, doi: 10.1007/s11263-019-01228-7.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Imam Ekowicaksono, I Wayan Wiprayoga Wisesa, Vita Fitriani

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








