Multilabel Machine Learning-Based Detection of Allergens in Food Recipes
DOI:
https://doi.org/10.30871/jaic.v10i2.12506Keywords:
Food Allergens, KNN, Machine Learning, MLP, SVMAbstract
Food allergens are substances that can trigger allergic reactions or intolerances in some individuals. According to recent data, the prevalence of food allergies worldwide ranges from 10% to 40%. In Indonesia, around 20% of children in their first-year experience reactions to the foods given to them. This research focuses on developing a machine learning model to detect allergens in food recipes, utilizing K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP) methods with a multilabel classification approach. The primary challenge is the difficulty of identifying hidden allergens in the diverse ingredients of recipes, which can be harmful to individuals with food allergies. This study utilizes 15,823 data points from a food recipe dataset, labeled both manually and automatically with five main types of allergens. After data Preprocessing and feature extraction using TF-IDF, the models were trained and tested with an 80:20 ratio. Results indicate that the SVM with hyperparameter tuning on the manually labeled dataset performed the best across all allergen types, achieving average F1-Scores of 0,9776.
Downloads
References
[1] Q. Liu, S. Lin, and N. Sun, “How does food matrix components affect food allergies, food allergens and the detection of food allergens? A systematic review,” Trends in Food Science & Technology, vol. 127, pp. 280–290, 2022, doi: 10.1016/j.tifs.2022.07.009
[2] K. Kamphorst, A. Lopez-Rincon, A. M. Vlieger, J. Garssen, E. van ’t Riet, and R. M. van Elburg, “Predictive factors for allergy at 4–6 years of age based on machine learning: A pilot study,” PharmaNutrition, vol. 23, 2023, doi: 10.1016/j.phanu.2022.100326.
[3] S. H. Sicherer and H. A. Sampson, “Food allergy: Epidemiology, pathogenesis, diagnosis, and treatment,” Journal of Allergy and Clinical Immunology, vol. 141, no. 1, pp. 41–58, 2018, doi: 10.1016/j.jaci.2017.11.003.
[4] R. L. Warren, J. M. Dyer, and R. S. Gupta, “Prevalence and characteristics of food allergy in the United States,” Journal of Allergy and Clinical Immunology, vol. 142, no. 2, pp. 394–403.e11, Aug. 2018.
[5] A. A. Loh and K. Tang, “Food allergy worldwide: Epidemiology, mechanisms, and prevention,” World Allergy Organization Journal, vol. 11, no. 1, 2018.
[6] Universitas Gadjah Mada, “Childhood allergy cases rising in Indonesia,” Oct. 2025. [Online]. Available: https://ugm.ac.id/en/news/childhood-allergy-cases-rising-in-indonesia-ugm-pediatrician-stresses-accurate-diagnosis-and-prevention/
[7] R. S. Gupta et al., “The public health impact of parent-reported childhood food allergies in the United States,” Pediatrics, vol. 142, no. 6, 2018, doi: 10.1542/peds.2018-1235.
[8] R. Zhou, J. Wang, Y. Li, A. Chen, and M. Wong, “Personalized nutrition recommendation system based on artificial intelligence and federated learning,” European Journal of Public Health and Environmental Research, vol. 1, no. 1, pp. 67–72, 2025.
[9] J. Muthukumar, P. Selvasekaran, M. Lokanadham, and R. Chidambaram, “Food and food products associated with food allergy and food intolerance – An overview,” Food Research International, vol. 138, 2020, doi: 10.1016/j.foodres.2020.109780.
[10] K. Verhoeckx et al., Food Processing and Allergenicity. Boca Raton, FL, USA: CRC Press, 2015.
[11] M. Mishra, T. Sarkar, T. Choudhury, N. Bansal, S. Smaoui, M. Rebezov, M. A. Shariati, and J. M. Lorenzo, “Allergen30: Detecting food items with possible allergens using deep learning-based computer vision,” Food Analytical Methods, vol. 15, no. 11, pp. 3045–3078, 2022, doi: 10.1007/s12161-022-02353-9.
[12] J. Zhang, D. Lee, K. Jungles, D. Shaltis, K. Najarian, R. Ravikumar, G. Sanders, and J. Gryak, “Prediction of oral food challenge outcomes via ensemble learning,” Informatics in Medicine Unlocked, vol. 36, 2023, doi: 10.1016/j.imu.2022.101142.
[13] A. A. Metwally, P. S. Yu, D. Reiman, Y. Dai, P. W. Finn, and D. L. Perkins, “Utilizing longitudinal microbiome taxonomic profiles to predict food allergy via long short-term memory networks,” PLoS Computational Biology, vol. 15, no. 2, 2019, doi: 10.1371/journal.pcbi.1006693.
[14] W. Min, S. Jiang, L. Liu, Y. Rui, and S. Jain, “A survey on food computing,” ACM Computing Surveys, vol. 52, no. 5, 2019, doi: 10.1145/3329168.
[15] J. Chen, L. Pang, and J. Luo, “Cross-modal recipe retrieval: How to cook this dish?” IEEE Transactions on Multimedia, vol. 23, pp. 447–460, 2021, doi: 10.1109/TMM.2020.2976817
[16] J. Marin, A. Biswas, F. Ofli, N. Hynes, A. Salvador, Y. Aytar, I. Weber, and A. Torralba, “Recipe1M+: A dataset for learning cross-modal embeddings for cooking recipes and food images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, pp. 187–203, 2021, doi: 10.1109/TPAMI.2019.2927476.
[17] J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, pp. 189–215, 2020.
[18] A. Gasparetto, M. Erba, A. Roldan, and F. Esposito, “A survey on text classification algorithms: From text to labels,” Information, vol. 13, no. 2, 2022
[19] K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown,
[20] U. I. Shabrina, R. Sarno, R. N. E. Anggraini, A. T. Haryono and A. F. Septiyanto, "Sentiment Analysis of Presidential Candidate Debates from YouTube Videos," 2024 IEEE International Conference on Artificial Intelligence and Mechatronics Systems (AIMS), Bandung, Indonesia, 2024, pp. 1-6, doi: 10.1109/AIMS61812.2024.10512640.
[21] R. N. E. Anggraini, A. Nugroho, R. Wahyuwidayat and R. Sarno, "Non-Compliance Level of Motor Vehicle Taxpayer Classification," 2023 14th International Conference on Information & Communication Technology and System (ICTS), Surabaya, Indonesia, 2023, pp. 261-264, doi: 10.1109/ICTS58770.2023.10330868.
[22] J. Vijaya, N. Jajam and D. Padhy, "Fine-Tuning Multilayer Perceptron Classifiers for Enhanced Heart Disease Prediction," 2025 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI), Gwalior, India, 2025, pp. 1-6, doi: 10.1109/IATMSI64286.2025.10984498
[23] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[24] J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global vectors for word representation,” in Proc. 2014 Conf. Empirical Methods in Natural Language Processing (EMNLP), Oct. 2014, pp. 1532–1543.
[25] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), vol. 1 (Long and Short Papers), Jun. 2019, pp. 4171–4186.
[26] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP,” in Proc. 28th Int. Conf. Computational Linguistics (COLING), Dec. 2020, pp. 757–770.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Ratih Anggraini, Ahmad Hafizh Assa’ad, Shintami Chusnul Hidayati

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








