OCR and NLP for Consumer Product Label Analysis: A Systematic Literature Review
DOI:
https://doi.org/10.30871/jaic.v10i3.12861Keywords:
BERT-based Natural Language Processing (NLP), Diabetes Risk, Food Composition Label, Optical Character Recognition, Ultra-Processed FoodAbstract
Composition labels on consumer products serve as an essential source of information for consumers in making purchasing decisions and for producers in ensuring regulatory compliance. However, manual reading of labels is prone to errors due to small font sizes, blurred images, low resolution, and perspective distortion, which reduce the accuracy of information recognition. This issue becomes increasingly critical as many label compositions contain ingredients associated with metabolic diseases, such as added sugar, artificial sweeteners, and carbohydrates, whose risks are well-documented in epidemiological literature. While numerous systematic reviews have addressed OCR and NLP in document intelligence, no prior review has systematically mapped the integration of OCR robustness, BERT-based semantic extraction, and food label-specific challenges in the context of multilingual Indonesian consumer products. To address this gap, this study conducts a Systematic Literature Review (SLR) following PRISMA guidelines, synthesizing recent advances in Transformer-based OCR and BERT-based NLP for consumer product label analysis. The review aims to map OCR performance under real-world conditions, identify best practices in preprocessing and post-correction, and evaluate end-to-end integration with BERT for semantic understanding of label compositions. The expected outcome is a theoretical contribution in the form of an integrative OCR–BERT framework proposal, along with practical design recommendations for future systems aimed at supporting nutritional literacy and informed consumer decision-making. Empirical validation of the proposed framework remains a direction for future research.
Downloads
References
[1] Amran Pandjaitan, Muhammad Rizki, and Orlando Tenang Saputra Rumahorbo, “Pengaruh Label Dan Harga Terhadap Keputusan Pembelian Rokok,” JURNAL EKONOMI BISNIS DAN MANAJEMEN, vol. 3, no. 1, pp. 271–280, Jan. 2025, doi: 10.59024/jise.v3i1.1106.
[2] U. Hasanah and B. Pambudi, “Pengaruh Kemasan dan Label terhadap Keputusan Pembelian,” Co-Value Jurnal Ekonomi Koperasi dan kewirausahaan, vol. 14, Nov. 2023, doi: 10.59188/covalue.v14i6.3903.
[3] S. Albasyira, A. Rasyid, and A. I. Fahrika, “Pengaruh Label BPOM dan E-review Terhadap Keputusan Pembelian Cosmetics Care (Studi pada Generasi Z Muslim di Watampone),” 2025.
[4] D. Rahmah and I. Hasbi, “Pengaruh Citra Merek dan Label terhadap Minat Beli (Studi Kasus Mie Gacoan di Kota Bandung),” Jurnal Samudra Ekonomi dan Bisnis, vol. 14, pp. 544–554, Sep. 2023, doi: 10.33059/jseb.v14i3.8287.
[5] S. Aliffia and Kurniawati, “Pengaruh Tingkat Kepedulian Masyarakat Terhadap Label Nutrisi Pada Frozen Food Yang Berhubungan Kepada Kesehatan Konsumen,” Jurnal Ekonomi Trisakti, 2023, [Online]. Available: https://api.semanticscholar.org/CorpusID:258155171
[6] P. A. Gunawan and Y. S. Kunto, “Pengaruh Brand Image Dan Nutrition Label Terhadap Keputusan Pembelian Mie Instan Lemonilo: Efek Moderasi Orientasi Makanan Sehat,” Apr. 2022.
[7] BPOM RI, Pedoman Label Pangan Olahan-2020. 2020.
[8] T. Lamont and M. McSweeney, “Consumer acceptability and chemical composition of whole-wheat breads incorporated with brown seaweed (Ascophyllum nodosum) or red seaweed (Chondrus crispus),” J. Sci. Food Agric., vol. 101, no. 4, pp. 1507–1514, Mar. 2021, doi: https://doi.org/10.1002/jsfa.10765.
[9] A. Ndunge Charles, M. Mburu, D. Njoroge, and V. Zettel, “Chemical composition and consumer acceptability of oyster mushroom and sorghum-pearl millet based composite flours,” Discover Food, vol. 4, no. 1, Dec. 2024, doi: 10.1007/s44187-024-00219-z.
[10] O. Olatunji, “Chemical Composition, Nutrient Bioavailability And Consumer Acceptability Of Cirina Forda (WESTWOOD) Larva-Enriched Vegetable Soups,” 2021.
[11] F. Kusnandar, H. Danniswara, and A. Sutriyono, “Pengaruh Komposisi Kimia dan Sifat Reologi Tepung Terigu terhadap Mutu Roti Manis,” Jurnal Mutu Pangan : Indonesian Journal of Food Quality, vol. 9, no. 2, pp. 67–75, Oct. 2022, doi: 10.29244/jmpi.2022.9.2.67.
[12] M. S. D. Taula’bi, Y. Y. Oessoe, and M. F. Sumual, “Kajian Komposisi Kimia Snack Bars Dari Berbagai Bahan Baku Lokal: Systematic Review,” Jan. 2021.
[13] D. C. Arukwe, J. N. Okoli, C. A. Nwachukwu, and A. L. Kenechukwu, “Chemical Composition, Functional Properties and Consumer Acceptability of Cookies Produced from Blends of Wheat, Orange Fleshed Sweet Potato and Pawpaw Flours,” Sahel Journal of Life Sciences FUDMA, vol. 3, no. 2, pp. 71–83, Jun. 2025, doi: 10.33003/sajols-2025-0302-09.
[14] Herviana Herviana, Haqqelni Nur Rosyidah, Amalina Rizma, Siska Pratiwi, Citra Dewi Anggraini, and Made Tantra Wirakesuma, “Analisis Pengaruh Sikap terhadap Kesehatan dan Label Kepada Kepatuhan Membaca Label Produk Pangan pada Mahasiswa Gizi Provinsi Kepulauan Riau,” Jurnal Riset Ilmu Kesehatan Umum dan Farmasi (JRIKUF), vol. 2, no. 1, pp. 137–143, Jan. 2024, doi: 10.57213/jrikuf.v2i1.159.
[15] F. Milita, S. Handayani, and B. Setiaji, “Kejadian Diabetes Mellitus Tipe II pada Lanjut Usia di Indonesia (Analisis Riskesdas 2018),” Jurnal Kedokteran dan Kesehatan, vol. 17, pp. 9–20, Feb. 2021, doi: 10.24853/jkk.17.1.9-20.
[16] F. R. Muharram, J. B. Swannjo, R. R. Melbiarta, and S. Martini, “Trends of diabetes and pre-diabetes in Indonesia 2013–2023: a serial analysis of national health surveys,” BMJ Open , vol. 15, no. 9, Sep. 2025, doi: 10.1136/bmjopen-2024-098575.
[17] S. Moradi et al., “Ultra-processed food consumption and adult diabetes risk: A systematic review and dose-response meta-analysis,” Dec. 01, 2021, MDPI. doi: 10.3390/nu13124410.
[18] M. I. Almarshad, R. Algonaiman, H. F. Alharbi, M. S. Almujaydil, and H. Barakat, “Relationship between Ultra-Processed Food Consumption and Risk of Diabetes Mellitus: A Mini-Review,” Nutrients, vol. 14, no. 12, 2022, doi: 10.3390/nu14122366.
[19] S. Sinha and M. Haque, “Obesity, Diabetes Mellitus, and Vascular Impediment as Consequences of Excess Processed Food Consumption,” Cureus, Sep. 2022, doi: 10.7759/cureus.28762.
[20] A. Mahajan, A. Deshmane, and A. Muley, “A Comparative Study on the Consumption Patterns of Processed Food Among Individuals With and Without Type 2 Diabetes,” International Journal of Public Health , vol. 70, 2025, doi: 10.3389/ijph.2025.1607931.
[21] H. Wang, C. Pan, X. Guo, C. Ji, and K. Deng, “From object detection to text detection and recognition: A brief evolution history of optical character recognition,” WIREs Computational Statistics, vol. 13, no. 5, p. e1547, Sep. 2021, doi: https://doi.org/10.1002/wics.1547.
[22] M. A. M. Salehudin et al., “Analysis of Optical Character Recognition using EasyOCR under Image Degradation,” in Journal of Physics: Conference Series, Institute of Physics, 2023. doi: 10.1088/1742-6596/2641/1/012001.
[23] U. Salimah, V. Maharani, and R. Nursyanti, “Automatic License Plate Recognition Using Optical Character Recognition,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1115, no. 1, p. 012023, Mar. 2021, doi: 10.1088/1757-899x/1115/1/012023.
[24] X. Wang et al., “Intelligent Micron Optical Character Recognition of DFB Chip Using Deep Convolutional Neural Network,” IEEE Trans. Instrum. Meas., vol. 71, pp. 1–9, 2022, doi: 10.1109/TIM.2022.3154831.
[25] N. Sarika, N. Sirisala, and M. S. Velpuru, “CNN based Optical Character Recognition and Applications,” in 2021 6th International Conference on Inventive Computation Technologies (ICICT), 2021, pp. 666–672. doi: 10.1109/ICICT50816.2021.9358735.
[26] M. S. Kasem, M. Mahmoud, and H.-S. Kang, “Advancements and Challenges in Arabic Optical Character Recognition: A Comprehensive Survey,” Dec. 2023, [Online]. Available: http://arxiv.org/abs/2312.11812
[27] Hubert, P. Phoenix, R. Sudaryono, and D. Suhartono, “Classifying Promotion Images Using Optical Character Recognition and Naïve Bayes Classifier,” Procedia Comput. Sci., vol. 179, pp. 498–506, 2021, doi: https://doi.org/10.1016/j.procs.2021.01.033.
[28] R. Hemnath, “Integrating Natural Language Processing with BERT and LSTM for Employee Sentiment Analysis in HRM,” 2025. [Online]. Available: www.ijahss.com
[29] C. Gunasekara, Z. Hamel, F. Du, and C. Baillie, “TokenOCR: An Attention Based Foundational Model for Intelligent Optical Character Recognition,” in International Conference on Pattern Recognition Applications and Methods, M. Castrillon-Santana, M. De Marsico, and A. Fred, Eds., Science and Technology Publications, Lda, 2025, pp. 151–158. doi: 10.5220/0013340100003905.
[30] M. Fujitake, “DTrOCR: Decoder-only Transformer for Optical Character Recognition,” Institute of Electrical and Electronics Engineers Inc., 2024, pp. 8010–8020. doi: 10.1109/WACV57701.2024.00784.
[31] C. Wang, Y. Yang, and M. Hu, “Scene Text Detection Method Based on Multimodal and Multi-Task Optimization,” in 2025 7th International Conference on Natural Language Processing (ICNLP), 2025, pp. 650–653. doi: 10.1109/ICNLP65360.2025.11108565.
[32] K. Todorov and G. Colavizza, “An Assessment of the Impact of OCR Noise on Language Models,” in International Conference on Agents and Artificial Intelligence, A. Rocha, L. Steels, and J. van den Herik, Eds., Science and Technology Publications, Lda, 2022, pp. 674–683. doi: 10.5220/0010945100003116.
[33] U. Kumaran, D. Biswas, B. Sneha, S. Nadipalli, and S. Raja, “Text Post-processing on Optical Character Recognition output using Natural Language Processing Methods,” Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/MysuruCon59703.2023.10396964.
[34] T. Kundaikar, S. Fadte, R. Karmali, and J. D. Pawar, “Automatic Hindi OCR Error Correction Using MLM-BERT,” Ingenierie des Systemes d’Information, vol. 29, no. 2, pp. 619–626, 2024, doi: 10.18280/isi.290223.
[35] M. Ahmed et al., “Towards Robust Learning with Noisy and Pseudo Labels for Text Classification,” Inf. Sci. (N. Y)., vol. 661, p. 120160, 2024, doi: https://doi.org/10.1016/j.ins.2024.120160.
[36] C.-T. Lin et al., “Text in the dark: Extremely low-light text image enhancement,” Signal Process. Image Commun., vol. 130, p. 117222, 2025, doi: https://doi.org/10.1016/j.image.2024.117222.
[37] Z. Yang, B. Liu, Y. Xiong, and G. Wu, “GDB: Gated Convolutions-based Document Binarization,” Pattern Recognit., vol. 146, p. 109989, 2024, doi: https://doi.org/10.1016/j.patcog.2023.109989.
[38] Z. Zhang, Z. Lu, J. Liu, and R. Bai, “Medical chief complaint classification with hierarchical structure of label descriptions,” Expert Syst. Appl., vol. 252, p. 123938, 2024, doi: https://doi.org/10.1016/j.eswa.2024.123938.
[39] M. Zhou, J. Tan, S. Yang, H. Wang, L. Wang, and Z. Xiao, “Ensemble Transfer Learning on Augmented Domain Resources for Oncological Named Entity Recognition in Chinese Clinical Records,” IEEE Access, vol. 11, pp. 80416–80428, 2023, doi: 10.1109/ACCESS.2023.3299824.
[40] J. Choi, H. Kong, H. Yoon, H. Oh, and Y. Jung, “LAME: Layout-Aware Metadata Extraction Approach for Research Articles,” Computers, Materials and Continua, vol. 72, no. 2, pp. 4019–4037, 2022, doi: https://doi.org/10.32604/cmc.2022.025711.
[41] H. T. Ha and A. Horák, “Information extraction from scanned invoice images using text analysis and layout features,” Signal Process. Image Commun., vol. 102, p. 116601, 2022, doi: https://doi.org/10.1016/j.image.2021.116601.
[42] J. Hu, Y. Lyu, and Y. Xue, “VS-MRC: A visual semantics-guided machine reading comprehension framework for multimodal named entity recognition with multiple images,” Knowl. Based. Syst., vol. 326, p. 114024, 2025, doi: https://doi.org/10.1016/j.knosys.2025.114024.
[43] Z. Zhang, J. Zhang, J. Du, and F. Wang, “Split, Embed and Merge: An accurate table structure recognizer,” Pattern Recognit., vol. 126, p. 108565, 2022, doi: https://doi.org/10.1016/j.patcog.2022.108565.
[44] P. M. L. de Lucena Drumond, L. P. Leite, T. E. de Campos, and F. A. Braz, “LayoutQT—Layout Quadrant Tags to embed visual features for document analysis,” Eng. Appl. Artif. Intell., vol. 122, p. 106091, 2023, doi: https://doi.org/10.1016/j.engappai.2023.106091.
[45] X. Ye, T. Shi, D. Huang, and T. Sakurai, “Multi-Omics clustering by integrating clinical features from large language model,” Methods, vol. 239, pp. 64–71, 2025, doi: https://doi.org/10.1016/j.ymeth.2025.03.017.
[46] X. Yang et al., “Cross language transformation of free text into structured lobectomy surgical records from a multi center study,” Sci. Rep., vol. 15, no. 1, 2025, doi: 10.1038/s41598-025-97500-7.
[47] S. I. S. P, K. L, P. Bhoomika, K. Tejaswi, L. S. Khande, and N. S. Vemishetti, “Automating Nutritional Claim Verification: The Role of OCR and Machine Learning in Enhancing Food Label Transparency,” in 2024 International Conference on IoT Based Control Networks and Intelligent Systems (ICICNIS), 2024, pp. 1164–1171. doi: 10.1109/ICICNIS64247.2024.10823177.
[48] P. A. Villa-García, R. Alonso-Calvo, and M. García-Remesal, “End-to-end entity extraction from OCRed texts using summarization models,” Neural Comput. Appl., vol. 36, no. 35, pp. 22347–22363, 2024, doi: 10.1007/s00521-024-10422-9.
[49] M. A. Hakim, R. A. Ifty, K. E. Delowar, S. H. Chowdhury, I. Rashid, and M. Shakib, “Nutriguard: LLM-Driven Nutritional Assessment for Chronic Disease Prevention,” in 2025 International Conference on Quantum Photonics, Artificial Intelligence, and Networking (QPAIN), 2025, pp. 1–6. doi: 10.1109/QPAIN66474.2025.11171750.
[50] S. K, A. R, C. R, and N. R, “AI-Powered Ingredient Detector for Allergies: Enhancing Food Safety Through Natural Language Processing and Computer Vision,” in 2025 International Conference on Computing and Communication Technologies (ICCCT), 2025, pp. 1–5. doi: 10.1109/ICCCT63501.2025.11019718.
[51] B. Kaufmann et al., “Validation of a Zero-shot Learning Natural Language Processing Tool to Facilitate Data Abstraction for Urologic Research,” Eur. Urol. Focus, vol. 10, no. 2, pp. 279–287, 2024, doi: https://doi.org/10.1016/j.euf.2024.01.009.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Musthofa Dzikry Pamungkas, Noor Falih, Muhammad Panji Muslim, Nanang Nasrulloh

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








