Multilabel Machine Learning-Based Detection of Allergens in Food Recipes

Authors

  • Ratih Anggraini Department of Informatics, Institut Teknologi Sepuluh Nopember
  • Ahmad Hafizh Assa’ad Department of Informatics, Institut Teknologi Sepuluh Nopember
  • Shintami Chusnul Hidayati Department of Informatics, Institut Teknologi Sepuluh Nopember

DOI:

https://doi.org/10.30871/jaic.v10i2.12506

Keywords:

Food Allergens, KNN, Machine Learning, MLP, SVM

Abstract

Food allergens are substances that can trigger allergic reactions or intolerances in some individuals. According to recent data, the prevalence of food allergies worldwide ranges from 10% to 40%. In Indonesia, around 20% of children in their first-year experience reactions to the foods given to them. This research focuses on developing a machine learning model to detect allergens in food recipes, utilizing K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP) methods with a multilabel classification approach. The primary challenge is the difficulty of identifying hidden allergens in the diverse ingredients of recipes, which can be harmful to individuals with food allergies. This study utilizes 15,823 data points from a food recipe dataset, labeled both manually and automatically with five main types of allergens. After data Preprocessing and feature extraction using TF-IDF, the models were trained and tested with an 80:20 ratio. Results indicate that the SVM with hyperparameter tuning on the manually labeled dataset performed the best across all allergen types, achieving average F1-Scores of 0,9776.

Downloads

Download data is not yet available.

References

[1] Q. Liu, S. Lin, and N. Sun, “How does food matrix components affect food allergies, food allergens and the detection of food allergens? A systematic review,” Trends in Food Science & Technology, vol. 127, pp. 280–290, 2022, doi: 10.1016/j.tifs.2022.07.009

[2] K. Kamphorst, A. Lopez-Rincon, A. M. Vlieger, J. Garssen, E. van ’t Riet, and R. M. van Elburg, “Predictive factors for allergy at 4–6 years of age based on machine learning: A pilot study,” PharmaNutrition, vol. 23, 2023, doi: 10.1016/j.phanu.2022.100326.

[3] S. H. Sicherer and H. A. Sampson, “Food allergy: Epidemiology, pathogenesis, diagnosis, and treatment,” Journal of Allergy and Clinical Immunology, vol. 141, no. 1, pp. 41–58, 2018, doi: 10.1016/j.jaci.2017.11.003.

[4] R. L. Warren, J. M. Dyer, and R. S. Gupta, “Prevalence and characteristics of food allergy in the United States,” Journal of Allergy and Clinical Immunology, vol. 142, no. 2, pp. 394–403.e11, Aug. 2018.

[5] A. A. Loh and K. Tang, “Food allergy worldwide: Epidemiology, mechanisms, and prevention,” World Allergy Organization Journal, vol. 11, no. 1, 2018.

[6] Universitas Gadjah Mada, “Childhood allergy cases rising in Indonesia,” Oct. 2025. [Online]. Available: https://ugm.ac.id/en/news/childhood-allergy-cases-rising-in-indonesia-ugm-pediatrician-stresses-accurate-diagnosis-and-prevention/

[7] R. S. Gupta et al., “The public health impact of parent-reported childhood food allergies in the United States,” Pediatrics, vol. 142, no. 6, 2018, doi: 10.1542/peds.2018-1235.

[8] R. Zhou, J. Wang, Y. Li, A. Chen, and M. Wong, “Personalized nutrition recommendation system based on artificial intelligence and federated learning,” European Journal of Public Health and Environmental Research, vol. 1, no. 1, pp. 67–72, 2025.

[9] J. Muthukumar, P. Selvasekaran, M. Lokanadham, and R. Chidambaram, “Food and food products associated with food allergy and food intolerance – An overview,” Food Research International, vol. 138, 2020, doi: 10.1016/j.foodres.2020.109780.

[10] K. Verhoeckx et al., Food Processing and Allergenicity. Boca Raton, FL, USA: CRC Press, 2015.

[11] M. Mishra, T. Sarkar, T. Choudhury, N. Bansal, S. Smaoui, M. Rebezov, M. A. Shariati, and J. M. Lorenzo, “Allergen30: Detecting food items with possible allergens using deep learning-based computer vision,” Food Analytical Methods, vol. 15, no. 11, pp. 3045–3078, 2022, doi: 10.1007/s12161-022-02353-9.

[12] J. Zhang, D. Lee, K. Jungles, D. Shaltis, K. Najarian, R. Ravikumar, G. Sanders, and J. Gryak, “Prediction of oral food challenge outcomes via ensemble learning,” Informatics in Medicine Unlocked, vol. 36, 2023, doi: 10.1016/j.imu.2022.101142.

[13] A. A. Metwally, P. S. Yu, D. Reiman, Y. Dai, P. W. Finn, and D. L. Perkins, “Utilizing longitudinal microbiome taxonomic profiles to predict food allergy via long short-term memory networks,” PLoS Computational Biology, vol. 15, no. 2, 2019, doi: 10.1371/journal.pcbi.1006693.

[14] W. Min, S. Jiang, L. Liu, Y. Rui, and S. Jain, “A survey on food computing,” ACM Computing Surveys, vol. 52, no. 5, 2019, doi: 10.1145/3329168.

[15] J. Chen, L. Pang, and J. Luo, “Cross-modal recipe retrieval: How to cook this dish?” IEEE Transactions on Multimedia, vol. 23, pp. 447–460, 2021, doi: 10.1109/TMM.2020.2976817

[16] J. Marin, A. Biswas, F. Ofli, N. Hynes, A. Salvador, Y. Aytar, I. Weber, and A. Torralba, “Recipe1M+: A dataset for learning cross-modal embeddings for cooking recipes and food images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, pp. 187–203, 2021, doi: 10.1109/TPAMI.2019.2927476.

[17] J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, pp. 189–215, 2020.

[18] A. Gasparetto, M. Erba, A. Roldan, and F. Esposito, “A survey on text classification algorithms: From text to labels,” Information, vol. 13, no. 2, 2022

[19] K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown,

[20] U. I. Shabrina, R. Sarno, R. N. E. Anggraini, A. T. Haryono and A. F. Septiyanto, "Sentiment Analysis of Presidential Candidate Debates from YouTube Videos," 2024 IEEE International Conference on Artificial Intelligence and Mechatronics Systems (AIMS), Bandung, Indonesia, 2024, pp. 1-6, doi: 10.1109/AIMS61812.2024.10512640.

[21] R. N. E. Anggraini, A. Nugroho, R. Wahyuwidayat and R. Sarno, "Non-Compliance Level of Motor Vehicle Taxpayer Classification," 2023 14th International Conference on Information & Communication Technology and System (ICTS), Surabaya, Indonesia, 2023, pp. 261-264, doi: 10.1109/ICTS58770.2023.10330868.

[22] J. Vijaya, N. Jajam and D. Padhy, "Fine-Tuning Multilayer Perceptron Classifiers for Enhanced Heart Disease Prediction," 2025 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI), Gwalior, India, 2025, pp. 1-6, doi: 10.1109/IATMSI64286.2025.10984498

[23] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.

[24] J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global vectors for word representation,” in Proc. 2014 Conf. Empirical Methods in Natural Language Processing (EMNLP), Oct. 2014, pp. 1532–1543.

[25] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), vol. 1 (Long and Short Papers), Jun. 2019, pp. 4171–4186.

[26] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP,” in Proc. 28th Int. Conf. Computational Linguistics (COLING), Dec. 2020, pp. 757–770.

Downloads

Published

2026-04-16

How to Cite

[1]
R. Anggraini, A. H. Assa’ad, and S. C. Hidayati, “Multilabel Machine Learning-Based Detection of Allergens in Food Recipes”, JAIC, vol. 10, no. 2, pp. 1136–1141, Apr. 2026.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.