Few-Shot Learning for Classifying Genuine and Bot Comments on YouTube Using Transformer Models
DOI:
https://doi.org/10.30871/jaic.v9i4.10023Keywords:
Few-Shot Learning(FSL), Transformers, DistilBERT, comment Bot, Text Classification, YouTube, Natural Language Processing, Web ApplicationsAbstract
This study aims to develop a comment classification system on the YouTube platform to distinguish between real accounts and bot accounts, addressing the challenge of limited labeled data through a few-shot learning approach. The issue of bot accounts masquerading as real users in comment sections is becoming increasingly prevalent and has the potential to spread spam, misinformation, and influence public opinion. In this study, a Transformer-based model, DistilBERT, is used, which is known for its efficiency in understanding natural language context. The model is trained in a few-shot scenario (N5 to N50) using a very limited amount of training data. Testing results show that the model maintains high and stable performance even with minimal data (N5), achieving an F1-score above 0.90. In addition, this system is implemented into a web application using Flask to enable direct and interactive comment detection. The main contribution of this research is the proof that the combination of few-shot learning and the DistilBERT model can provide a practical and efficient solution for classifying YouTube bot account comments even with limited data conditions, as well as providing a replicable approach for similar problems on other digital platforms.
Downloads
References
[1] R. Qonita, and Laily Rosidah, and Fahmi, “Pengaruh Youtube Terhadap Kemampuan Interaksi Sosial Anak Usia 5-6 Tahun,” Indones. J. Early Child. J. Dunia Anak Usia Dini, vol. 5, no. 1, pp. 197–206, 2023, doi: 10.35473/ijec.v5i1.2054.
[2] E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini, “The rise of social bots,” Commun. ACM, vol. 59, no. 7, pp. 96–104, 2016, doi: 10.1145/2818717.
[3] N. Pasieka, M. Kulynych, S. Chupakhina, Y. Romanyshyn, and M. Pasieka, “Harmful effects of fake social media accounts and learning platforms,” CEUR Workshop Proc., vol. 2923, pp. 258–271, 2021.
[4] L. H. X. Ng and K. M. Carley, “What is a Social Media Bot? A Global Comparison of Bot and Human Characteristics,” pp. 1–18, 2025, doi: 10.1038/s41598-025-96372-1.
[5] D. Hamdana and A. Husna, “Obtaining Elderly Patients’ Lifestyle Information from Unstructured Text Sources,” Proc. Malikussaleh Int. Conf. Multidiscip. Stud., vol. 3, no. 3, p. 00022, 2023, doi: 10.29103/micoms.v3i.181.
[6] and U. A. K. Amani Aljehani , Syed Hamid Hasan, “Advancing Text Classification : A Systematic Review of Few- Shot Learning Approaches,” Int. J. Comput. Digit. Syst., pp. 1–14, 2022.
[7] M. Qamal, D. Hamdhana, and M. Martin, “Sistem Pakar Untuk Mendiagnosa Penyakit Angina Pektoris (Angin Duduk) Dengan Metode Forward Chaining Berbasis Web,” TECHSI - J. Tek. Inform., vol. 12, no. 1, p. 86, 2020, doi: 10.29103/techsi.v12i1.2150.
[8] H. yang Lu, C. Fan, X. Song, and W. Fang, “A novel few-shot learning based multimodality fusion model for COVID-9 rumor detection from online social media,” PeerJ Comput. Sci., vol. 7, no. 2011, pp. 1–24, 2021, doi: 10.7717/peerj-cs.688.
[9] B. Lwowski and P. Najafirad, “COVID-19 Surveillance through Twitter using Self-Supervised and Few Shot Learning,” 2020, doi: 10.18653/v1/2020.nlpcovid19-2.9.
[10] F. Rashif, G. Ihza Perwira Nirvana, M. Alif Noor, and N. Aini Rakhmawati, “Implementasi LDA untuk Pengelompokan Topik Cuitan Akun Bot Twitter bertagar #Covid-19,” Cogito Smart J. , vol. 7, no. 1, pp. 1–12, 2021.
[11] U. Tunc, E. Atalar, M. S. Gargi, and Z. E. A. And, “Classification of Fake , Bot , and Real Accounts on Instagram Using Machine Learning Makine Öğrenmesi ile Instagram ’ da Sahte , Bot ve Gerçek Hesapların Sınıflandırılması,” Politek. Derg., vol. 0900, no. 2, pp. 0–13, 2022, doi: 10.2339/politeknik.1136226.
[12] J. S. Rohit Kundu, “Everything you need to know about Few-Shot Learning,” digitalocean. Accessed: Nov. 10, 2024. [Online]. Available: https://www.digitalocean.com/community/tutorials/few-shot-learning#how-does-few-shot-learning-work
[13] C. Schneebeli, “Coding Emotion in Computer-Mediated Communication : The Example of YouTube Comments,” Rech. anglaises Nord., vol. 51, no. 1, pp. 45–56, 2018, doi: 10.3406/ranam.2018.1563.
[14] S. Yang, S. Park, Y. Jang, and M. Lee, “YTCommentQA: Video Question Answerability in Instructional Videos,” Proc. AAAI Conf. Artif. Intell., vol. 38, no. 17, pp. 19359–19367, 2024, doi: 10.1609/aaai.v38i17.29906.
[15] D. O’Callaghan, M. Harrigan, J. Carthy, and P. Cunningham, “Network analysis of recurring YouTube spam campaigns,” ICWSM 2012 - Proc. 6th Int. AAAI Conf. Weblogs Soc. Media, pp. 531–534, 2012, doi: 10.1609/icwsm.v6i1.14288.
[16] F. Barbieri, J. Camacho-Collados, L. Neves, and L. Espinosa-Anke, “TWEETEVAL: Unified benchmark and comparative evaluation for tweet classification,” Find. Assoc. Comput. Linguist. Find. ACL EMNLP 2020, pp. 1644–1650, 2020, doi: 10.18653/v1/2020.findings-emnlp.148.
[17] M. Gaber, M. Ahmed, and H. Janicke, “Zero Day Ransomware Detection with Pulse: Function Classification with Transformer Models and Assembly Language,” Comput. Secur., vol. 148, no. August 2024, p. 104167, 2024, doi: 10.1016/j.cose.2024.104167.
[18] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” pp. 2–6, 2019, [Online]. Available: http://arxiv.org/abs/1910.01108
[19] G. Brauwers and F. Frasincar, “A General Survey on Attention Mechanisms in Deep Learning,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 4, pp. 3279–3298, 2023, doi: 10.1109/TKDE.2021.3126456.
[20] T. Lin, Y. Wang, X. Liu, and X. Qiu, “A survey of transformers,” AI Open, vol. 3, no. 1, pp. 111–132, 2022, doi: 10.1016/j.aiopen.2022.10.001.
[21] A. de Santana Correia and E. L. Colombini, “Attention, please! A survey of neural attention models in deep learning,” Artif. Intell. Rev., vol. 55, no. 8, pp. 6037–6124, 2022, doi: 10.1007/s10462-022-10148-x.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Nahdah Fikriah Nst, Defry Hamdhana, Mukti Qamal

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








