Few-Shot Learning for Classifying Genuine and Bot Comments on YouTube Using Transformer Models

Nahdah Fikriah Nst; Defry Hamdhana; Mukti Qamal

doi:10.30871/jaic.v9i4.10023

Authors

Nahdah Fikriah Nst Universitas Malikussaleh
Defry Hamdhana Universitas Malikussaleh
Mukti Qamal Universitas Malikussaleh

DOI:

https://doi.org/10.30871/jaic.v9i4.10023

Keywords:

Few-Shot Learning(FSL), Transformers, DistilBERT, comment Bot, Text Classification, YouTube, Natural Language Processing, Web Applications

Abstract

This study aims to develop a comment classification system on the YouTube platform to distinguish between real accounts and bot accounts, addressing the challenge of limited labeled data through a few-shot learning approach. The issue of bot accounts masquerading as real users in comment sections is becoming increasingly prevalent and has the potential to spread spam, misinformation, and influence public opinion. In this study, a Transformer-based model, DistilBERT, is used, which is known for its efficiency in understanding natural language context. The model is trained in a few-shot scenario (N5 to N50) using a very limited amount of training data. Testing results show that the model maintains high and stable performance even with minimal data (N5), achieving an F1-score above 0.90. In addition, this system is implemented into a web application using Flask to enable direct and interactive comment detection. The main contribution of this research is the proof that the combination of few-shot learning and the DistilBERT model can provide a practical and efficient solution for classifying YouTube bot account comments even with limited data conditions, as well as providing a replicable approach for similar problems on other digital platforms.

Downloads

Download data is not yet available.

References

[1] R. Qonita, and Laily Rosidah, and Fahmi, “Pengaruh Youtube Terhadap Kemampuan Interaksi Sosial Anak Usia 5-6 Tahun,” Indones. J. Early Child. J. Dunia Anak Usia Dini, vol. 5, no. 1, pp. 197–206, 2023, doi: 10.35473/ijec.v5i1.2054.

[2] E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini, “The rise of social bots,” Commun. ACM, vol. 59, no. 7, pp. 96–104, 2016, doi: 10.1145/2818717.

[3] N. Pasieka, M. Kulynych, S. Chupakhina, Y. Romanyshyn, and M. Pasieka, “Harmful effects of fake social media accounts and learning platforms,” CEUR Workshop Proc., vol. 2923, pp. 258–271, 2021.

[4] L. H. X. Ng and K. M. Carley, “What is a Social Media Bot? A Global Comparison of Bot and Human Characteristics,” pp. 1–18, 2025, doi: 10.1038/s41598-025-96372-1.

[5] D. Hamdana and A. Husna, “Obtaining Elderly Patients’ Lifestyle Information from Unstructured Text Sources,” Proc. Malikussaleh Int. Conf. Multidiscip. Stud., vol. 3, no. 3, p. 00022, 2023, doi: 10.29103/micoms.v3i.181.

[6] and U. A. K. Amani Aljehani , Syed Hamid Hasan, “Advancing Text Classification : A Systematic Review of Few- Shot Learning Approaches,” Int. J. Comput. Digit. Syst., pp. 1–14, 2022.

[7] M. Qamal, D. Hamdhana, and M. Martin, “Sistem Pakar Untuk Mendiagnosa Penyakit Angina Pektoris (Angin Duduk) Dengan Metode Forward Chaining Berbasis Web,” TECHSI - J. Tek. Inform., vol. 12, no. 1, p. 86, 2020, doi: 10.29103/techsi.v12i1.2150.

[8] H. yang Lu, C. Fan, X. Song, and W. Fang, “A novel few-shot learning based multimodality fusion model for COVID-9 rumor detection from online social media,” PeerJ Comput. Sci., vol. 7, no. 2011, pp. 1–24, 2021, doi: 10.7717/peerj-cs.688.

[9] B. Lwowski and P. Najafirad, “COVID-19 Surveillance through Twitter using Self-Supervised and Few Shot Learning,” 2020, doi: 10.18653/v1/2020.nlpcovid19-2.9.

[10] F. Rashif, G. Ihza Perwira Nirvana, M. Alif Noor, and N. Aini Rakhmawati, “Implementasi LDA untuk Pengelompokan Topik Cuitan Akun Bot Twitter bertagar #Covid-19,” Cogito Smart J. , vol. 7, no. 1, pp. 1–12, 2021.

[11] U. Tunc, E. Atalar, M. S. Gargi, and Z. E. A. And, “Classification of Fake , Bot , and Real Accounts on Instagram Using Machine Learning Makine Öğrenmesi ile Instagram ’ da Sahte , Bot ve Gerçek Hesapların Sınıflandırılması,” Politek. Derg., vol. 0900, no. 2, pp. 0–13, 2022, doi: 10.2339/politeknik.1136226.

[12] J. S. Rohit Kundu, “Everything you need to know about Few-Shot Learning,” digitalocean. Accessed: Nov. 10, 2024. [Online]. Available: https://www.digitalocean.com/community/tutorials/few-shot-learning#how-does-few-shot-learning-work

[13] C. Schneebeli, “Coding Emotion in Computer-Mediated Communication : The Example of YouTube Comments,” Rech. anglaises Nord., vol. 51, no. 1, pp. 45–56, 2018, doi: 10.3406/ranam.2018.1563.

[14] S. Yang, S. Park, Y. Jang, and M. Lee, “YTCommentQA: Video Question Answerability in Instructional Videos,” Proc. AAAI Conf. Artif. Intell., vol. 38, no. 17, pp. 19359–19367, 2024, doi: 10.1609/aaai.v38i17.29906.

[15] D. O’Callaghan, M. Harrigan, J. Carthy, and P. Cunningham, “Network analysis of recurring YouTube spam campaigns,” ICWSM 2012 - Proc. 6th Int. AAAI Conf. Weblogs Soc. Media, pp. 531–534, 2012, doi: 10.1609/icwsm.v6i1.14288.

[16] F. Barbieri, J. Camacho-Collados, L. Neves, and L. Espinosa-Anke, “TWEETEVAL: Unified benchmark and comparative evaluation for tweet classification,” Find. Assoc. Comput. Linguist. Find. ACL EMNLP 2020, pp. 1644–1650, 2020, doi: 10.18653/v1/2020.findings-emnlp.148.

[17] M. Gaber, M. Ahmed, and H. Janicke, “Zero Day Ransomware Detection with Pulse: Function Classification with Transformer Models and Assembly Language,” Comput. Secur., vol. 148, no. August 2024, p. 104167, 2024, doi: 10.1016/j.cose.2024.104167.

[18] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” pp. 2–6, 2019, [Online]. Available: http://arxiv.org/abs/1910.01108

[19] G. Brauwers and F. Frasincar, “A General Survey on Attention Mechanisms in Deep Learning,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 4, pp. 3279–3298, 2023, doi: 10.1109/TKDE.2021.3126456.

[20] T. Lin, Y. Wang, X. Liu, and X. Qiu, “A survey of transformers,” AI Open, vol. 3, no. 1, pp. 111–132, 2022, doi: 10.1016/j.aiopen.2022.10.001.

[21] A. de Santana Correia and E. L. Colombini, “Attention, please! A survey of neural attention models in deep learning,” Artif. Intell. Rev., vol. 55, no. 8, pp. 6037–6124, 2022, doi: 10.1007/s10462-022-10148-x.

Few-Shot Learning for Classifying Genuine and Bot Comments on YouTube Using Transformer Models

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

submit

tools

issn