Benchmarking YOLOv12 Variants for Indonesian Traditional Cuisine Detection

Fauzan Firdaus; Lidya Ningsih; Aminah Indahsari Marsuki; Angel Metanosa Afinda

doi:10.30871/jaic.v10i3.12625

Authors

Fauzan Firdaus Telkom University
Lidya Ningsih Telkom University
Aminah Indahsari Marsuki Telkom University
Angel Metanosa Afinda Telkom University

DOI:

https://doi.org/10.30871/jaic.v10i3.12625

Keywords:

YOLOv12, Object Detection, Indonesian Cuisine, Food Detection, Benchmark Analysis

Abstract

YOLOv12 is one of the latest YOLO versions currently. Several studies have proven that YOLOv12 has better performance compared to previous versions. YOLOv12 itself has five model variants based on its architectural complexity, namely nano, small, medium, large and extra larges. This study tests the performance of YOLOv12 model variants (n, s, m, l, x) for traditional Indonesian culinary detection using a domain-specific object detection dataset. The dataset contains 718 images with 720 bounding-box instances annotated across 20 culinary classes, divided into 418/150/150 images for training/validation/testing. Data processing was performed in Roboflow with automatic orientation and stretching resizing to 640×640, while the training split was enriched using augmentation (horizontal and vertical flips) to increase sample diversity. All YOLOv12 variants were trained with the same configuration and environment, for 50 epochs using the Ultralytics framework with default hyperparameters on an NVIDIA A100-SXM4 80GB GPU. On the validation set, all variants achieved high detection accuracy (mAP@0.5 = 0.985–0.991), while differences emerged under a more stringent localization criterion (mAP@0.5:0.95). The best overall localization performance was achieved by YOLOv12-L (mAP@0.5:0.95 = 0.874), while YOLOv12-N provided the fastest inference (0.8 ms/image) with competitive accuracy (mAP@0.5:0.95 = 0.822). These findings provide preliminary guidance for selecting YOLOv12 variants based on the trade-off between accuracy and speed.

Downloads

Download data is not yet available.

References

[1] A. Tuomi and M. P. Ascenção, “Intelligent automation in hospitality: exploring the relative automatability of frontline food service tasks,” J. Hosp. Tour. Insights, vol. 6, no. 1, pp. 151–173, Nov. 2021, doi: 10.1108/JHTI-07-2021-0175.

[2] M. Gerasimchuk and A. Uzhinskiy, “Food Recognition for Smart Restaurants and Self-Service Cafes,” Phys. Part. Nucl. Lett., vol. 21, no. 1, pp. 79–83, Feb. 2024, doi: 10.1134/S1547477124010059.

[3] G. I. Alkady, “A Deep Learning-Powered Web Service for Optimal Restaurant Recommendations Based on Customers Food Preferences,” in 2024 16th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), Jun. 2024, pp. 1–4. doi: 10.1109/ECAI61503.2024.10607587.

[4] M. Han, J. Chen, and Z. Zhou, “NutrifyAI: An AI-Powered System for Real-Time Food Detection, Nutritional Analysis, and Personalized Meal Recommendations,” Oct. 21, 2024, arXiv: arXiv:2408.10532. doi: 10.48550/arXiv.2408.10532.

[5] G. C. Utami, C. R. Widiawati, and P. Subarkah, “Detection of Indonesian Food to Estimate Nutritional Information Using YOLOv5,” Teknika, vol. 12, no. 2, pp. 158–165, Jun. 2023, doi: 10.34148/teknika.v12i2.636.

[6] F. Romadhon et al., “Food Image Detection System and Calorie Content Estimation Using Yolo to Control Calorie Intake in the Body,” E3S Web Conf., vol. 465, p. 02057, 2023, doi: 10.1051/e3sconf/202346502057.

[7] T. Selvaraju, V. Dakshinamurthi, G. Badurudeen, N. Prabhakaran, P. Gopi, and M. A. N. Hussain, “Food detection and estimation of calories and other macro-nutrients and features to support healthy lifestyle,” AIP Conf. Proc., vol. 3279, no. 1, p. 020129, Apr. 2025, doi: 10.1063/5.0263063.

[8] A. Dhelia, S. Chordia, and K. B, “YOLO-based Food Damage Detection: An Automated Approach for Quality Control in Food Industry,” in 2024 8th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Oct. 2024, pp. 1444–1449. doi: 10.1109/I-SMAC61858.2024.10714664.

[9] N. Rane, “YOLO and Faster R-CNN object detection for smart Industry 4.0 and Industry 5.0: applications, challenges, and opportunities,” Oct. 25, 2023, Social Science Research Network, Rochester, NY: 4624206. doi: 10.2139/ssrn.4624206.

[10] M. Y. Wu, J. H. Lee, and C. Y. Hsueh, “A Framework of Visual Checkout System Using Convolutional Neural Networks for Bento Buffet,” Sensors, vol. 21, no. 8, p. 2627, Jan. 2021, doi: 10.3390/s21082627.

[11] J. W. Park, Y. H. Cho, M. K. Park, and Y. D. Kim, “Consumer Usability Test of Mobile Food Safety Inquiry Platform Based on Image Recognition,” Sustainability, vol. 16, no. 21, p. 9538, Jan. 2024, doi: 10.3390/su16219538.

[12] X. Huang et al., “Application of Image Computing in Non-Destructive Detection of Chinese Cuisine,” Foods, vol. 14, no. 14, p. 2488, Jan. 2025, doi: 10.3390/foods14142488.

[13] A. Sanatbyek, A. Karabay, H. A. Varol, and M. Y. Chan, “Deep Object Recognition-Based Analysis of Diverse Culinary Landscapes,” in 2025 IEEE International Conference on Image Processing (ICIP), Sep. 2025, pp. 1127–1132. doi: 10.1109/ICIP55913.2025.11084477.

[14] N. Zheng, X. Song, W. T. Tang, S.-K. Ng, L. Nie, and R. Zimmermann, “Unsupervised Few-Shot Food Recognition With Intra-Class Variation and Inter-Class Similarity Modeling,” IEEE Trans. Circuits Syst. Video Technol., vol. 35, no. 12, pp. 12138–12151, Dec. 2025, doi: 10.1109/TCSVT.2025.3585925.

[15] D. Pandey et al., “Object Detection in Indian Food Platters using Transfer Learning with YOLOv4,” in 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW), May 2022, pp. 101–106. doi: 10.1109/ICDEW55742.2022.00021.

[16] A. H. Rangkuti, J. M. Kerta, R. Y. Mogot, and V. H. Athala, “Identification of Indonesian Traditional Foods Using Machine Learning and Supported by Segmentation Methods,” JOIV Int. J. Inform. Vis., vol. 8, no. 4, pp. 2324–2335, Dec. 2024, doi: 10.62527/joiv.8.4.2545.

[17] L. Bossard, M. Guillaumin, and L. Van Gool, “Food-101 – Mining Discriminative Components with Random Forests,” in Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., Cham: Springer International Publishing, 2014, pp. 446–461. doi: 10.1007/978-3-319-10599-4_29.

[18] J. Dai, X. Hu, M. Li, Y. Li, and S. Du, “The multi-learning for food analyses in computer vision: a survey,” Multimed. Tools Appl., vol. 82, no. 17, pp. 25615–25650, Jul. 2023, doi: 10.1007/s11042-023-14373-6.

[19] N. Aditama and R. Munir, “Indonesian Street Food Calorie Estimation Using Mask R-CNN and Multiple Linear Regression,” in 2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T), Mar. 2022, pp. 1–6. doi: 10.1109/ICPC2T53885.2022.9776804.

[20] M. Nadeem, H. Shen, L. Choy, and J. M. H. Barakat, “Smart Diet Diary: Real-Time Mobile Application for Food Recognition,” Appl. Syst. Innov., vol. 6, no. 2, p. 53, Apr. 2023, doi: 10.3390/asi6020053.

[21] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” May 09, 2016, arXiv: arXiv:1506.02640. doi: 10.48550/arXiv.1506.02640.

[22] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7263–7271. Accessed: Jan. 16, 2026. [Online]. Available: https://openaccess.thecvf.com/content_cvpr_2017/html/Redmon_YOLO9000_Better_Faster_CVPR_2017_paper.html

[23] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” Apr. 08, 2018, arXiv: arXiv:1804.02767. doi: 10.48550/arXiv.1804.02767.

[24] I. Kurmashev, V. Semenyuk, A. Lupidi, D. Alyoshin, L. Kurmasheva, and A. Cantelli-Forti, “Study of the Optimal YOLO Visual Detector Model for Enhancing UAV Detection and Classification in Optoelectronic Channels of Sensor Fusion Systems,” Drones, vol. 9, no. 11, p. 732, Nov. 2025, doi: 10.3390/drones9110732.

[25] C. Y. Wang, A. Bochkovskiy, and H. Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” Jul. 06, 2022, arXiv: arXiv:2207.02696. doi: 10.48550/arXiv.2207.02696.

[26] N. Jegham, C. Y. Koh, M. Abdelatti, and A. Hendawi, “YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous Versions,” Mar. 17, 2025, arXiv: arXiv:2411.00201. doi: 10.48550/arXiv.2411.00201.

[27] Y. Tian, Q. Ye, and D. Doermann, “YOLOv12: Attention-Centric Real-Time Object Detectors,” Feb. 18, 2025, arXiv: arXiv:2502.12524. doi: 10.48550/arXiv.2502.12524.

[28] President University, “Indonesian-Traditional-Cuisine Computer Vision Model.” 2023. [Online]. Available: https://universe.roboflow.com/president-university-y2m5p/indonesian-traditional-cuisine

[29] R. Padilla, W. L. Passos, T. L. B. Dias, S. L. Netto, and E. A. B. da Silva, “A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit,” Electronics, vol. 10, no. 3, p. 279, Jan. 2021, doi: 10.3390/electronics10030279.

[30] A. Badithela, T. Wongpiromsarn, and R. M. Murray, “Evaluation Metrics for Object Detection for Autonomous Systems,” Oct. 19, 2022, arXiv: arXiv:2210.10298. doi: 10.48550/arXiv.2210.10298.

Benchmarking YOLOv12 Variants for Indonesian Traditional Cuisine Detection

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

submit

tools

issn