OCR and NLP for Consumer Product Label Analysis: A Systematic Literature Review

Authors

  • Musthofa Dzikry Pamungkas Universitas Pembangunan Nasional Veteran Jakarta
  • Noor Falih Universitas Pembangunan Nasional Veteran Jakarta
  • Muhammad Panji Muslim Universitas Pembangunan Nasional Veteran Jakarta
  • Nanang Nasrulloh Universitas Pembangunan Nasional Veteran Jakarta

DOI:

https://doi.org/10.30871/jaic.v10i3.12861

Keywords:

BERT-based Natural Language Processing (NLP), Diabetes Risk, Food Composition Label, Optical Character Recognition, Ultra-Processed Food

Abstract

Composition labels on consumer products serve as an essential source of information for consumers in making purchasing decisions and for producers in ensuring regulatory compliance. However, manual reading of labels is prone to errors due to small font sizes, blurred images, low resolution, and perspective distortion, which reduce the accuracy of information recognition. This issue becomes increasingly critical as many label compositions contain ingredients associated with metabolic diseases, such as added sugar, artificial sweeteners, and carbohydrates, whose risks are well-documented in epidemiological literature. While numerous systematic reviews have addressed OCR and NLP in document intelligence, no prior review has systematically mapped the integration of OCR robustness, BERT-based semantic extraction, and food label-specific challenges in the context of multilingual Indonesian consumer products. To address this gap, this study conducts a Systematic Literature Review (SLR) following PRISMA guidelines, synthesizing recent advances in Transformer-based OCR and BERT-based NLP for consumer product label analysis. The review aims to map OCR performance under real-world conditions, identify best practices in preprocessing and post-correction, and evaluate end-to-end integration with BERT for semantic understanding of label compositions. The expected outcome is a theoretical contribution in the form of an integrative OCR–BERT framework proposal, along with practical design recommendations for future systems aimed at supporting nutritional literacy and informed consumer decision-making. Empirical validation of the proposed framework remains a direction for future research.

Downloads

Download data is not yet available.

References

[1] Amran Pandjaitan, Muhammad Rizki, and Orlando Tenang Saputra Rumahorbo, “Pengaruh Label Dan Harga Terhadap Keputusan Pembelian Rokok,” JURNAL EKONOMI BISNIS DAN MANAJEMEN, vol. 3, no. 1, pp. 271–280, Jan. 2025, doi: 10.59024/jise.v3i1.1106.

[2] U. Hasanah and B. Pambudi, “Pengaruh Kemasan dan Label terhadap Keputusan Pembelian,” Co-Value Jurnal Ekonomi Koperasi dan kewirausahaan, vol. 14, Nov. 2023, doi: 10.59188/covalue.v14i6.3903.

[3] S. Albasyira, A. Rasyid, and A. I. Fahrika, “Pengaruh Label BPOM dan E-review Terhadap Keputusan Pembelian Cosmetics Care (Studi pada Generasi Z Muslim di Watampone),” 2025.

[4] D. Rahmah and I. Hasbi, “Pengaruh Citra Merek dan Label terhadap Minat Beli (Studi Kasus Mie Gacoan di Kota Bandung),” Jurnal Samudra Ekonomi dan Bisnis, vol. 14, pp. 544–554, Sep. 2023, doi: 10.33059/jseb.v14i3.8287.

[5] S. Aliffia and Kurniawati, “Pengaruh Tingkat Kepedulian Masyarakat Terhadap Label Nutrisi Pada Frozen Food Yang Berhubungan Kepada Kesehatan Konsumen,” Jurnal Ekonomi Trisakti, 2023, [Online]. Available: https://api.semanticscholar.org/CorpusID:258155171

[6] P. A. Gunawan and Y. S. Kunto, “Pengaruh Brand Image Dan Nutrition Label Terhadap Keputusan Pembelian Mie Instan Lemonilo: Efek Moderasi Orientasi Makanan Sehat,” Apr. 2022.

[7] BPOM RI, Pedoman Label Pangan Olahan-2020. 2020.

[8] T. Lamont and M. McSweeney, “Consumer acceptability and chemical composition of whole-wheat breads incorporated with brown seaweed (Ascophyllum nodosum) or red seaweed (Chondrus crispus),” J. Sci. Food Agric., vol. 101, no. 4, pp. 1507–1514, Mar. 2021, doi: https://doi.org/10.1002/jsfa.10765.

[9] A. Ndunge Charles, M. Mburu, D. Njoroge, and V. Zettel, “Chemical composition and consumer acceptability of oyster mushroom and sorghum-pearl millet based composite flours,” Discover Food, vol. 4, no. 1, Dec. 2024, doi: 10.1007/s44187-024-00219-z.

[10] O. Olatunji, “Chemical Composition, Nutrient Bioavailability And Consumer Acceptability Of Cirina Forda (WESTWOOD) Larva-Enriched Vegetable Soups,” 2021.

[11] F. Kusnandar, H. Danniswara, and A. Sutriyono, “Pengaruh Komposisi Kimia dan Sifat Reologi Tepung Terigu terhadap Mutu Roti Manis,” Jurnal Mutu Pangan : Indonesian Journal of Food Quality, vol. 9, no. 2, pp. 67–75, Oct. 2022, doi: 10.29244/jmpi.2022.9.2.67.

[12] M. S. D. Taula’bi, Y. Y. Oessoe, and M. F. Sumual, “Kajian Komposisi Kimia Snack Bars Dari Berbagai Bahan Baku Lokal: Systematic Review,” Jan. 2021.

[13] D. C. Arukwe, J. N. Okoli, C. A. Nwachukwu, and A. L. Kenechukwu, “Chemical Composition, Functional Properties and Consumer Acceptability of Cookies Produced from Blends of Wheat, Orange Fleshed Sweet Potato and Pawpaw Flours,” Sahel Journal of Life Sciences FUDMA, vol. 3, no. 2, pp. 71–83, Jun. 2025, doi: 10.33003/sajols-2025-0302-09.

[14] Herviana Herviana, Haqqelni Nur Rosyidah, Amalina Rizma, Siska Pratiwi, Citra Dewi Anggraini, and Made Tantra Wirakesuma, “Analisis Pengaruh Sikap terhadap Kesehatan dan Label Kepada Kepatuhan Membaca Label Produk Pangan pada Mahasiswa Gizi Provinsi Kepulauan Riau,” Jurnal Riset Ilmu Kesehatan Umum dan Farmasi (JRIKUF), vol. 2, no. 1, pp. 137–143, Jan. 2024, doi: 10.57213/jrikuf.v2i1.159.

[15] F. Milita, S. Handayani, and B. Setiaji, “Kejadian Diabetes Mellitus Tipe II pada Lanjut Usia di Indonesia (Analisis Riskesdas 2018),” Jurnal Kedokteran dan Kesehatan, vol. 17, pp. 9–20, Feb. 2021, doi: 10.24853/jkk.17.1.9-20.

[16] F. R. Muharram, J. B. Swannjo, R. R. Melbiarta, and S. Martini, “Trends of diabetes and pre-diabetes in Indonesia 2013–2023: a serial analysis of national health surveys,” BMJ Open , vol. 15, no. 9, Sep. 2025, doi: 10.1136/bmjopen-2024-098575.

[17] S. Moradi et al., “Ultra-processed food consumption and adult diabetes risk: A systematic review and dose-response meta-analysis,” Dec. 01, 2021, MDPI. doi: 10.3390/nu13124410.

[18] M. I. Almarshad, R. Algonaiman, H. F. Alharbi, M. S. Almujaydil, and H. Barakat, “Relationship between Ultra-Processed Food Consumption and Risk of Diabetes Mellitus: A Mini-Review,” Nutrients, vol. 14, no. 12, 2022, doi: 10.3390/nu14122366.

[19] S. Sinha and M. Haque, “Obesity, Diabetes Mellitus, and Vascular Impediment as Consequences of Excess Processed Food Consumption,” Cureus, Sep. 2022, doi: 10.7759/cureus.28762.

[20] A. Mahajan, A. Deshmane, and A. Muley, “A Comparative Study on the Consumption Patterns of Processed Food Among Individuals With and Without Type 2 Diabetes,” International Journal of Public Health , vol. 70, 2025, doi: 10.3389/ijph.2025.1607931.

[21] H. Wang, C. Pan, X. Guo, C. Ji, and K. Deng, “From object detection to text detection and recognition: A brief evolution history of optical character recognition,” WIREs Computational Statistics, vol. 13, no. 5, p. e1547, Sep. 2021, doi: https://doi.org/10.1002/wics.1547.

[22] M. A. M. Salehudin et al., “Analysis of Optical Character Recognition using EasyOCR under Image Degradation,” in Journal of Physics: Conference Series, Institute of Physics, 2023. doi: 10.1088/1742-6596/2641/1/012001.

[23] U. Salimah, V. Maharani, and R. Nursyanti, “Automatic License Plate Recognition Using Optical Character Recognition,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1115, no. 1, p. 012023, Mar. 2021, doi: 10.1088/1757-899x/1115/1/012023.

[24] X. Wang et al., “Intelligent Micron Optical Character Recognition of DFB Chip Using Deep Convolutional Neural Network,” IEEE Trans. Instrum. Meas., vol. 71, pp. 1–9, 2022, doi: 10.1109/TIM.2022.3154831.

[25] N. Sarika, N. Sirisala, and M. S. Velpuru, “CNN based Optical Character Recognition and Applications,” in 2021 6th International Conference on Inventive Computation Technologies (ICICT), 2021, pp. 666–672. doi: 10.1109/ICICT50816.2021.9358735.

[26] M. S. Kasem, M. Mahmoud, and H.-S. Kang, “Advancements and Challenges in Arabic Optical Character Recognition: A Comprehensive Survey,” Dec. 2023, [Online]. Available: http://arxiv.org/abs/2312.11812

[27] Hubert, P. Phoenix, R. Sudaryono, and D. Suhartono, “Classifying Promotion Images Using Optical Character Recognition and Naïve Bayes Classifier,” Procedia Comput. Sci., vol. 179, pp. 498–506, 2021, doi: https://doi.org/10.1016/j.procs.2021.01.033.

[28] R. Hemnath, “Integrating Natural Language Processing with BERT and LSTM for Employee Sentiment Analysis in HRM,” 2025. [Online]. Available: www.ijahss.com

[29] C. Gunasekara, Z. Hamel, F. Du, and C. Baillie, “TokenOCR: An Attention Based Foundational Model for Intelligent Optical Character Recognition,” in International Conference on Pattern Recognition Applications and Methods, M. Castrillon-Santana, M. De Marsico, and A. Fred, Eds., Science and Technology Publications, Lda, 2025, pp. 151–158. doi: 10.5220/0013340100003905.

[30] M. Fujitake, “DTrOCR: Decoder-only Transformer for Optical Character Recognition,” Institute of Electrical and Electronics Engineers Inc., 2024, pp. 8010–8020. doi: 10.1109/WACV57701.2024.00784.

[31] C. Wang, Y. Yang, and M. Hu, “Scene Text Detection Method Based on Multimodal and Multi-Task Optimization,” in 2025 7th International Conference on Natural Language Processing (ICNLP), 2025, pp. 650–653. doi: 10.1109/ICNLP65360.2025.11108565.

[32] K. Todorov and G. Colavizza, “An Assessment of the Impact of OCR Noise on Language Models,” in International Conference on Agents and Artificial Intelligence, A. Rocha, L. Steels, and J. van den Herik, Eds., Science and Technology Publications, Lda, 2022, pp. 674–683. doi: 10.5220/0010945100003116.

[33] U. Kumaran, D. Biswas, B. Sneha, S. Nadipalli, and S. Raja, “Text Post-processing on Optical Character Recognition output using Natural Language Processing Methods,” Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/MysuruCon59703.2023.10396964.

[34] T. Kundaikar, S. Fadte, R. Karmali, and J. D. Pawar, “Automatic Hindi OCR Error Correction Using MLM-BERT,” Ingenierie des Systemes d’Information, vol. 29, no. 2, pp. 619–626, 2024, doi: 10.18280/isi.290223.

[35] M. Ahmed et al., “Towards Robust Learning with Noisy and Pseudo Labels for Text Classification,” Inf. Sci. (N. Y)., vol. 661, p. 120160, 2024, doi: https://doi.org/10.1016/j.ins.2024.120160.

[36] C.-T. Lin et al., “Text in the dark: Extremely low-light text image enhancement,” Signal Process. Image Commun., vol. 130, p. 117222, 2025, doi: https://doi.org/10.1016/j.image.2024.117222.

[37] Z. Yang, B. Liu, Y. Xiong, and G. Wu, “GDB: Gated Convolutions-based Document Binarization,” Pattern Recognit., vol. 146, p. 109989, 2024, doi: https://doi.org/10.1016/j.patcog.2023.109989.

[38] Z. Zhang, Z. Lu, J. Liu, and R. Bai, “Medical chief complaint classification with hierarchical structure of label descriptions,” Expert Syst. Appl., vol. 252, p. 123938, 2024, doi: https://doi.org/10.1016/j.eswa.2024.123938.

[39] M. Zhou, J. Tan, S. Yang, H. Wang, L. Wang, and Z. Xiao, “Ensemble Transfer Learning on Augmented Domain Resources for Oncological Named Entity Recognition in Chinese Clinical Records,” IEEE Access, vol. 11, pp. 80416–80428, 2023, doi: 10.1109/ACCESS.2023.3299824.

[40] J. Choi, H. Kong, H. Yoon, H. Oh, and Y. Jung, “LAME: Layout-Aware Metadata Extraction Approach for Research Articles,” Computers, Materials and Continua, vol. 72, no. 2, pp. 4019–4037, 2022, doi: https://doi.org/10.32604/cmc.2022.025711.

[41] H. T. Ha and A. Horák, “Information extraction from scanned invoice images using text analysis and layout features,” Signal Process. Image Commun., vol. 102, p. 116601, 2022, doi: https://doi.org/10.1016/j.image.2021.116601.

[42] J. Hu, Y. Lyu, and Y. Xue, “VS-MRC: A visual semantics-guided machine reading comprehension framework for multimodal named entity recognition with multiple images,” Knowl. Based. Syst., vol. 326, p. 114024, 2025, doi: https://doi.org/10.1016/j.knosys.2025.114024.

[43] Z. Zhang, J. Zhang, J. Du, and F. Wang, “Split, Embed and Merge: An accurate table structure recognizer,” Pattern Recognit., vol. 126, p. 108565, 2022, doi: https://doi.org/10.1016/j.patcog.2022.108565.

[44] P. M. L. de Lucena Drumond, L. P. Leite, T. E. de Campos, and F. A. Braz, “LayoutQT—Layout Quadrant Tags to embed visual features for document analysis,” Eng. Appl. Artif. Intell., vol. 122, p. 106091, 2023, doi: https://doi.org/10.1016/j.engappai.2023.106091.

[45] X. Ye, T. Shi, D. Huang, and T. Sakurai, “Multi-Omics clustering by integrating clinical features from large language model,” Methods, vol. 239, pp. 64–71, 2025, doi: https://doi.org/10.1016/j.ymeth.2025.03.017.

[46] X. Yang et al., “Cross language transformation of free text into structured lobectomy surgical records from a multi center study,” Sci. Rep., vol. 15, no. 1, 2025, doi: 10.1038/s41598-025-97500-7.

[47] S. I. S. P, K. L, P. Bhoomika, K. Tejaswi, L. S. Khande, and N. S. Vemishetti, “Automating Nutritional Claim Verification: The Role of OCR and Machine Learning in Enhancing Food Label Transparency,” in 2024 International Conference on IoT Based Control Networks and Intelligent Systems (ICICNIS), 2024, pp. 1164–1171. doi: 10.1109/ICICNIS64247.2024.10823177.

[48] P. A. Villa-García, R. Alonso-Calvo, and M. García-Remesal, “End-to-end entity extraction from OCRed texts using summarization models,” Neural Comput. Appl., vol. 36, no. 35, pp. 22347–22363, 2024, doi: 10.1007/s00521-024-10422-9.

[49] M. A. Hakim, R. A. Ifty, K. E. Delowar, S. H. Chowdhury, I. Rashid, and M. Shakib, “Nutriguard: LLM-Driven Nutritional Assessment for Chronic Disease Prevention,” in 2025 International Conference on Quantum Photonics, Artificial Intelligence, and Networking (QPAIN), 2025, pp. 1–6. doi: 10.1109/QPAIN66474.2025.11171750.

[50] S. K, A. R, C. R, and N. R, “AI-Powered Ingredient Detector for Allergies: Enhancing Food Safety Through Natural Language Processing and Computer Vision,” in 2025 International Conference on Computing and Communication Technologies (ICCCT), 2025, pp. 1–5. doi: 10.1109/ICCCT63501.2025.11019718.

[51] B. Kaufmann et al., “Validation of a Zero-shot Learning Natural Language Processing Tool to Facilitate Data Abstraction for Urologic Research,” Eur. Urol. Focus, vol. 10, no. 2, pp. 279–287, 2024, doi: https://doi.org/10.1016/j.euf.2024.01.009.

Downloads

Published

2026-06-14

How to Cite

[1]
M. D. Pamungkas, N. Falih, M. P. Muslim, and N. Nasrulloh, “OCR and NLP for Consumer Product Label Analysis: A Systematic Literature Review”, JAIC, vol. 10, no. 3, pp. 2588–2597, Jun. 2026.

Similar Articles

<< < 2 3 4 5 6 > >> 

You may also start an advanced similarity search for this article.