Comparative Analysis of CNN, Transformers, and Traditional ML for Classifying Online Gambling Spam Comments in Indonesian

Martin Clinton Tosima Manullang; Arkham Zahri Rakhman; Hartanto Tantriawan; Andika Setiawan

doi:10.30871/jaic.v9i3.9468

Authors

Martin Clinton Tosima Manullang Teknik Informatika, Fakultas Teknologi Industri, Institut Teknologi Sumatera
Arkham Zahri Rakhman Teknik Informatika, Fakultas Teknologi Industri, Institut Teknologi Sumatera
Hartanto Tantriawan Teknik Informatika, Fakultas Teknologi Industri, Institut Teknologi Sumatera
Andika Setiawan Teknik Informatika, Fakultas Teknologi Industri, Institut Teknologi Sumatera

DOI:

https://doi.org/10.30871/jaic.v9i3.9468

Keywords:

indonesian language, deep learning, spam detection, transformer, wordformer

Abstract

The rise of user-generated content on social media and live-streaming platforms has intensified the spread of spam, particularly online gambling (Judi Online) promotions, which remain prevalent in Indonesian comment sections. This study investigates the effectiveness of various machine learning (ML) and deep learning (DL) approaches in classifying such spam content in Bahasa Indonesia. We compare five models: Support Vector Machine (SVM), Random Forest (RF), a CNN-based model, IndoBERT, and a custom lightweight transformer model named Wordformer. While IndoBERT achieves the highest performance across all metrics, it comes with high computational demands. Wordformer, in contrast, delivers a strong balance between accuracy and efficiency, outperforming traditional models while being significantly more lightweight than IndoBERT. Wordformer achieved 0.9975 accuracy and macro F1-score, surpassing SVM (0.9578) and Random Forest (0.9729), while maintaining a significantly smaller model size and fewer multiply-add operations. An extensive ablation study further explores the architectural and training design choices that influence Wordformer’s performance. The findings suggest that lightweight transformer models can offer practical, scalable solutions for spam detection in low-resource language settings without the need for large pretrained backbones.

Downloads

Download data is not yet available.

References

[1] A. Fahrudin et al., “Online gambling addiction: Problems and solutions for policymakers and stakeholders in Indonesia,” J. Infrastruct. Pol. Dev., vol. 8, no. 11, p. 9077, Oct. 2024.

[2] T. N. Dellia Putri Octavia, “Negative Impacts Of Online Gambling Reviewed From The Social Economic And Psychological Perspective In Accordance With Undergoing No. 1 Of 2024 On Second Amendment To Undergoing Number11 Of 2008 On Information And Transactions,” in International Conference Restructuring and Transforming Law, 2024.

[3] S. Sriyana, “Judi Online: Dampak Sosial, Ekonomi, Dan Psikologis Di Era Digital,” J SOCIOPOLITICO, Feb. 2025.

[4] A. Kosasih and T. Setiady, “Akibat hukum Artis promosikan situs slot Judi online dampak terhadap masyarakat Dan upaya penanggulangnya,” YUSTISI, vol. 12, no. 1, pp. 67–78, Feb. 2025.

[5] A. Nurdiansyah and A. S. Kanda, “Bahaya Judi Online : Dampak Sosial, Ekonomi, Dan Kesehatan,” sscj-amik, vol. 2, no. 1, pp. 305–310, Jan. 2024.

[6] L. Rafiqah and H. Rasyid, “The Dampak Judi Online terhadap Kehidupan Sosial Ekonomi Masyarakat,” Al-Mutharahah, vol. 20, no. 2, pp. 282–290, Dec. 2023.

[7] A. A. Hadi, A. Zaky, N. Rizqiananda, and B. Unggaran, “Edukasi Bahaya Judi Online Digital Sebagai Upaya Pencegahan Dampak Sosial Dan Ekonomi Bagi Masyarakat Komplek Graha Indah 2 Pamulang,” Krepa, vol. 3, no. 12, pp. 61–70, Dec. 2024.

[8] A. R. Chrismanto, A. K. Sari, and Y. Suyanto, “Critical evaluation on spam content detection in social media,” Journal of Theoretical and Applied Information Technology, vol. 100, no. 8, pp. 2642–2667, Apr. 2022.

[9] S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, and J. Gao, “Deep learning--based text classification: A comprehensive review,” ACM Comput. Surv., vol. 54, no. 3, pp. 1–40, Apr. 2022.

[10] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, “Support vector machines,” IEEE Intell. Syst., vol. 13, no. 4, pp. 18–28, Jul. 1998.

[11] L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001.

[12] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North, Minneapolis, Minnesota, 2019, pp. 4171–4186.

[13] L. P. Hung and S. Alias, “Beyond sentiment analysis: A review of recent trends in text based sentiment analysis and emotion detection,” J. Adv. Comput. Intell. Intell. Inform., vol. 27, no. 1, pp. 84–95, Jan. 2023.

[14] L. Gong and R. Ji, “What does a TextCNN learn?,” arXiv [stat.ML], 18-Jan-2018.

[15] Z. Hou et al., “C-BDCLSTM: A false emotion recognition model in micro blogs combined Char-CNN with bidirectional dilated convolutional LSTM,” Appl. Soft Comput., vol. 130, no. 109659, p. 109659, Nov. 2022.

[16] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” arXiv [cs.CL], 02-Oct-2019.

[17] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “ALBERT: A lite BERT for self-supervised learning of language representations,” arXiv [cs.CL], 26-Sep-2019.

[18] X. Jiao et al., “TinyBERT: Distilling BERT for natural language understanding,” arXiv [cs.CL], 23-Sep-2019.

[19] Z. Sun, H. Yu, X. Song, R. Liu, Y. Yang, and D. Zhou, “MobileBERT: A compact task-agnostic BERT for resource-limited devices,” arXiv [cs.CL], 06-Apr-2020.

[20] Yaemico, “Deteksi Judi Online.” 07-Oct-2024.

[21] Y. Kim, “Convolutional Neural Networks for Sentence Classification,” arXiv [cs.CL], 25-Aug-2014.

[22] B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” in Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020.

Comparative Analysis of CNN, Transformers, and Traditional ML for Classifying Online Gambling Spam Comments in Indonesian

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

submit

tools

issn