Addressing Extreme Class Imbalance in Multilingual Complaint Classification Using XLM-RoBERTa

Authors

  • Muhammad Ariyanto Sistem Informasi, Fakultas Ilmu Komputer, Universitas Dian Nuswantoro
  • Farrikh Alzami Sistem Informasi, Fakultas Ilmu Komputer, Universitas Dian Nuswantoro
  • Ramadhan Rakhmat Sani Sistem Informasi, Fakultas Ilmu Komputer, Universitas Dian Nuswantoro
  • Indra Gamayanto Sistem Informasi, Fakultas Ilmu Komputer, Universitas Dian Nuswantoro
  • Muhammad Naufal Teknik Informatika, Fakultas Ilmu Komputer, Universitas Dian Nuswantoro
  • Sri Winarno Teknik Informatika, Fakultas Ilmu Komputer, Universitas Dian Nuswantoro
  • Iswahyudi Dinas Komunikasi dan Informatika Provinsi Jawa Tengah

DOI:

https://doi.org/10.30871/jaic.v10i1.11606

Keywords:

Class Imbalance, E-Government, Indoneian NLP, Text Classification, XLM-RoBERTa

Abstract

Government complaint management systems often suffer from extreme class imbalance, where a few public service categories accumulate most reports while many others remain under-represented. This research examines whether simple class weighting can improve fairness in multilingual transformer models for automatic routing of Indonesian citizen complaints on the LaporGub Central Java e-governance platform. The dataset comprises 53,877 Indonesian-language complaints spanning 18 service categories with an imbalance ratio of about 227:1 between the largest and smallest classes. After cleaning and deduplication, we stratify the data into training, validation, and test sets. We compare three approaches: (i) a linear support vector machine (SVM) with term frequency inverse document frequency (TF-IDF) unigram and bigram and class-balanced weights, (ii) a cross-lingual RoBERTa (XLM-RoBERTa-base) model without class weighting, and (iii) an XLM-RoBERTa-base model with a class-weighted cross-entropy loss. Fairness is operationalised as equal importance for categories and quantified primarily using the macro-averaged F1-score (Macro-F1), complemented by per-class F1, weighted F1, and accuracy. The unweighted XLM-RoBERTa model outperforms the SVM baseline in Macro-F1 (0.610 vs 0.561). The class-weighted variant attains similar Macro-F1 (0.608) while redistributing performance towards minority categories. Analysis shows that class weighting is most beneficial for categories with a few hundred to several thousand samples, whereas extremely rare categories with fewer than 200 complaints remain difficult for all models and require additional data-centric interventions. These findings demonstrate that multilingual transformer architectures combined with simple class weighting can provide a more balanced backbone for automated complaint routing in Indonesian e-government, particularly for low- and medium-frequency service categories.

Downloads

Download data is not yet available.

References

[1] M. M. Priyadharshan, “A Digital Governance Framework for Intelligent Complaint Registration, Tracking, and Transparent Redressal,” vol. 8, no. 5, 2025.

[2] D. Hiremath, H. Patil, R. Patil, V. Hiremath, and C. R. Shivanagi, “Public Online Complaint Registration and Management System,” vol. 9, no. 3, 2024.

[3] F. Caldeira, L. Nunes, and R. Ribeiro, “Classification of Public Administration Complaints,” in 11th Symposium on Languages, Applications and Technologies (SLATE 2022), J. Cordeiro, M. J. Pereira, N. F. Rodrigues, and S. Pais, Eds., in Open Access Series in Informatics (OASIcs), vol. 104. Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022, p. 9:1-9:12. doi: 10.4230/OASIcs.SLATE.2022.9.

[4] T. Joachims, “Text categorization with Support Vector Machines: Learning with many relevant features,” in Machine Learning: ECML-98, vol. 1398, C. Nédellec and C. Rouveirol, Eds., in Lecture Notes in Computer Science, vol. 1398. , Berlin, Heidelberg: Springer Berlin Heidelberg, 1998, pp. 137–142. doi: 10.1007/BFb0026683.

[5] Q. Li et al., “A Survey on Text Classification: From Traditional to Deep Learning,” ACM Trans. Intell. Syst. Technol., vol. 13, no. 2, pp. 1–41, Apr. 2022, doi: 10.1145/3495162.

[6] A. Conneau et al., “Unsupervised Cross-lingual Representation Learning at Scale,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds., Online: Association for Computational Linguistics, Jul. 2020, pp. 8440–8451. doi: 10.18653/v1/2020.acl-main.747.

[7] M. I. Ragab, E. H. Mohamed, and W. Medhat, “Multilingual Propaganda Detection: Exploring Transformer-Based Models mBERT, XLM-RoBERTa, and mT5,” in Proceedings of the first International Workshop on Nakba Narratives as Language Resources, M. Jarrar, H. Habash, and M. El-Haj, Eds., Abu Dhabi: Association for Computational Linguistics, Jan. 2025, pp. 75–82. Accessed: Oct. 28, 2025. [Online]. Available: https://aclanthology.org/2025.nakbanlp-1.9/

[8] F. Ullah et al., “Prompt-based fine-tuning with multilingual transformers for language-independent sentiment analysis,” Sci Rep, vol. 15, no. 1, p. 20834, Jul. 2025, doi: 10.1038/s41598-025-03559-7.

[9] C.-H. Lin and U. Nuha, “Sentiment analysis of Indonesian datasets based on a hybrid deep-learning strategy,” J Big Data, vol. 10, no. 1, p. 88, May 2023, doi: 10.1186/s40537-023-00782-9.

[10] L. Afuan, N. Hidayat, H. Hamdani, H. Ismanto, B. C. Purnama, and D. I. Ramdhani, “Optimizing BERT Models with Fine-Tuning for Indonesian Twitter Sentiment Analysis,” JOWUA, vol. 16, no. 2, pp. 248–267, Jun. 2025, doi: 10.58346/JOWUA.2025.I2.016.

[11] W. Christian, D. Adamlu, A. Yu, and D. Suhartono, “Leveraging IndoBERT and DistilBERT for Indonesian Emotion Classification in E-Commerce Reviews,” Sep. 18, 2025, arXiv: arXiv:2509.14611. doi: 10.48550/arXiv.2509.14611.

[12] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” in Proceedings of the 28th International Conference on Computational Linguistics, D. Scott, N. Bel, and C. Zong, Eds., Barcelona, Spain (Online): International Committee on Computational Linguistics, Dec. 2020, pp. 757–770. doi: 10.18653/v1/2020.coling-main.66.

[13] B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” in Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, K.-F. Wong, K. Knight, and H. Wu, Eds., Suzhou, China: Association for Computational Linguistics, Dec. 2020, pp. 843–857. doi: 10.18653/v1/2020.aacl-main.85.

[14] W. Wongso, D. S. Setiawan, S. Limcorn, and A. Joyoadikusumo, “NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural,” in Proceedings of the Second Workshop in South East Asian Language Processing, D. Wijaya, A. F. Aji, C. Vania, G. I. Winata, and A. Purwarianti, Eds., Online: Association for Computational Linguistics, Jan. 2025, pp. 10–26. Accessed: Oct. 28, 2025. [Online]. Available: https://aclanthology.org/2025.sealp-1.2/

[15] S. Subramanian, A. Rahimi, T. Baldwin, T. Cohn, and L. Frermann, “Fairness-aware Class Imbalanced Learning,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M.-F. Moens, X. Huang, L. Specia, and S. W. Yih, Eds., Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021, pp. 2045–2051. doi: 10.18653/v1/2021.emnlp-main.155.

[16] B. Nemade, V. Bharadi, S. S. Alegavi, and B. Marakarkandy, “A Comprehensive Review: SMOTE-Based Oversampling Methods for Imbalanced Classification Techniques, Evaluation, and Result Comparisons,” International Journal of Intelligent Systems and Applications in Engineering, vol. 11, no. 9s, pp. 790–803, Jul. 2023.

[17] F. Boabang and S. A. Gyamerah, “An Enhanced Focal Loss Function to Mitigate Class Imbalance in Auto Insurance Fraud Detection with Explainable AI,” Aug. 04, 2025, arXiv: arXiv:2508.02283. doi: 10.48550/arXiv.2508.02283.

[18] Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-Balanced Loss Based on Effective Number of Samples”.

[19] S. m T. I. Tonmoy, “Embeddings at BLP-2023 Task 2: Optimizing Fine-Tuned Transformers with Cost-Sensitive Learning for Multiclass Sentiment Analysis,” in Proceedings of the First Workshop on Bangla Language Processing (BLP-2023), F. Alam, S. Kar, S. A. Chowdhury, F. Sadeque, and R. Amin, Eds., Singapore: Association for Computational Linguistics, Dec. 2023, pp. 340–346. doi: 10.18653/v1/2023.banglalp-1.46.

[20] T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Q. Liu and D. Schlangen, Eds., Online: Association for Computational Linguistics, Oct. 2020, pp. 38–45. doi: 10.18653/v1/2020.emnlp-demos.6.

[21] S. F. Taskiran, B. Turkoglu, E. Kaya, and T. Asuroglu, “A comprehensive evaluation of oversampling techniques for enhancing text classification performance,” Sci Rep, vol. 15, no. 1, p. 21631, Jul. 2025, doi: 10.1038/s41598-025-05791-7.

[22] I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization.,” 2019.

Downloads

Published

2026-02-04

How to Cite

[1]
M. . Ariyanto, “Addressing Extreme Class Imbalance in Multilingual Complaint Classification Using XLM-RoBERTa”, JAIC, vol. 10, no. 1, pp. 13–22, Feb. 2026.

Most read articles by the same author(s)

Similar Articles

<< < 27 28 29 30 31 > >> 

You may also start an advanced similarity search for this article.