Addressing Extreme Class Imbalance in Multilingual Complaint Classification Using XLM-RoBERTa
DOI:
https://doi.org/10.30871/jaic.v10i1.11606Keywords:
Class Imbalance, E-Government, Indoneian NLP, Text Classification, XLM-RoBERTaAbstract
Government complaint management systems often suffer from extreme class imbalance, where a few public service categories accumulate most reports while many others remain under-represented. This research examines whether simple class weighting can improve fairness in multilingual transformer models for automatic routing of Indonesian citizen complaints on the LaporGub Central Java e-governance platform. The dataset comprises 53,877 Indonesian-language complaints spanning 18 service categories with an imbalance ratio of about 227:1 between the largest and smallest classes. After cleaning and deduplication, we stratify the data into training, validation, and test sets. We compare three approaches: (i) a linear support vector machine (SVM) with term frequency inverse document frequency (TF-IDF) unigram and bigram and class-balanced weights, (ii) a cross-lingual RoBERTa (XLM-RoBERTa-base) model without class weighting, and (iii) an XLM-RoBERTa-base model with a class-weighted cross-entropy loss. Fairness is operationalised as equal importance for categories and quantified primarily using the macro-averaged F1-score (Macro-F1), complemented by per-class F1, weighted F1, and accuracy. The unweighted XLM-RoBERTa model outperforms the SVM baseline in Macro-F1 (0.610 vs 0.561). The class-weighted variant attains similar Macro-F1 (0.608) while redistributing performance towards minority categories. Analysis shows that class weighting is most beneficial for categories with a few hundred to several thousand samples, whereas extremely rare categories with fewer than 200 complaints remain difficult for all models and require additional data-centric interventions. These findings demonstrate that multilingual transformer architectures combined with simple class weighting can provide a more balanced backbone for automated complaint routing in Indonesian e-government, particularly for low- and medium-frequency service categories.
Downloads
References
[1] M. M. Priyadharshan, “A Digital Governance Framework for Intelligent Complaint Registration, Tracking, and Transparent Redressal,” vol. 8, no. 5, 2025.
[2] D. Hiremath, H. Patil, R. Patil, V. Hiremath, and C. R. Shivanagi, “Public Online Complaint Registration and Management System,” vol. 9, no. 3, 2024.
[3] F. Caldeira, L. Nunes, and R. Ribeiro, “Classification of Public Administration Complaints,” in 11th Symposium on Languages, Applications and Technologies (SLATE 2022), J. Cordeiro, M. J. Pereira, N. F. Rodrigues, and S. Pais, Eds., in Open Access Series in Informatics (OASIcs), vol. 104. Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022, p. 9:1-9:12. doi: 10.4230/OASIcs.SLATE.2022.9.
[4] T. Joachims, “Text categorization with Support Vector Machines: Learning with many relevant features,” in Machine Learning: ECML-98, vol. 1398, C. Nédellec and C. Rouveirol, Eds., in Lecture Notes in Computer Science, vol. 1398. , Berlin, Heidelberg: Springer Berlin Heidelberg, 1998, pp. 137–142. doi: 10.1007/BFb0026683.
[5] Q. Li et al., “A Survey on Text Classification: From Traditional to Deep Learning,” ACM Trans. Intell. Syst. Technol., vol. 13, no. 2, pp. 1–41, Apr. 2022, doi: 10.1145/3495162.
[6] A. Conneau et al., “Unsupervised Cross-lingual Representation Learning at Scale,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds., Online: Association for Computational Linguistics, Jul. 2020, pp. 8440–8451. doi: 10.18653/v1/2020.acl-main.747.
[7] M. I. Ragab, E. H. Mohamed, and W. Medhat, “Multilingual Propaganda Detection: Exploring Transformer-Based Models mBERT, XLM-RoBERTa, and mT5,” in Proceedings of the first International Workshop on Nakba Narratives as Language Resources, M. Jarrar, H. Habash, and M. El-Haj, Eds., Abu Dhabi: Association for Computational Linguistics, Jan. 2025, pp. 75–82. Accessed: Oct. 28, 2025. [Online]. Available: https://aclanthology.org/2025.nakbanlp-1.9/
[8] F. Ullah et al., “Prompt-based fine-tuning with multilingual transformers for language-independent sentiment analysis,” Sci Rep, vol. 15, no. 1, p. 20834, Jul. 2025, doi: 10.1038/s41598-025-03559-7.
[9] C.-H. Lin and U. Nuha, “Sentiment analysis of Indonesian datasets based on a hybrid deep-learning strategy,” J Big Data, vol. 10, no. 1, p. 88, May 2023, doi: 10.1186/s40537-023-00782-9.
[10] L. Afuan, N. Hidayat, H. Hamdani, H. Ismanto, B. C. Purnama, and D. I. Ramdhani, “Optimizing BERT Models with Fine-Tuning for Indonesian Twitter Sentiment Analysis,” JOWUA, vol. 16, no. 2, pp. 248–267, Jun. 2025, doi: 10.58346/JOWUA.2025.I2.016.
[11] W. Christian, D. Adamlu, A. Yu, and D. Suhartono, “Leveraging IndoBERT and DistilBERT for Indonesian Emotion Classification in E-Commerce Reviews,” Sep. 18, 2025, arXiv: arXiv:2509.14611. doi: 10.48550/arXiv.2509.14611.
[12] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” in Proceedings of the 28th International Conference on Computational Linguistics, D. Scott, N. Bel, and C. Zong, Eds., Barcelona, Spain (Online): International Committee on Computational Linguistics, Dec. 2020, pp. 757–770. doi: 10.18653/v1/2020.coling-main.66.
[13] B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” in Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, K.-F. Wong, K. Knight, and H. Wu, Eds., Suzhou, China: Association for Computational Linguistics, Dec. 2020, pp. 843–857. doi: 10.18653/v1/2020.aacl-main.85.
[14] W. Wongso, D. S. Setiawan, S. Limcorn, and A. Joyoadikusumo, “NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural,” in Proceedings of the Second Workshop in South East Asian Language Processing, D. Wijaya, A. F. Aji, C. Vania, G. I. Winata, and A. Purwarianti, Eds., Online: Association for Computational Linguistics, Jan. 2025, pp. 10–26. Accessed: Oct. 28, 2025. [Online]. Available: https://aclanthology.org/2025.sealp-1.2/
[15] S. Subramanian, A. Rahimi, T. Baldwin, T. Cohn, and L. Frermann, “Fairness-aware Class Imbalanced Learning,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M.-F. Moens, X. Huang, L. Specia, and S. W. Yih, Eds., Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021, pp. 2045–2051. doi: 10.18653/v1/2021.emnlp-main.155.
[16] B. Nemade, V. Bharadi, S. S. Alegavi, and B. Marakarkandy, “A Comprehensive Review: SMOTE-Based Oversampling Methods for Imbalanced Classification Techniques, Evaluation, and Result Comparisons,” International Journal of Intelligent Systems and Applications in Engineering, vol. 11, no. 9s, pp. 790–803, Jul. 2023.
[17] F. Boabang and S. A. Gyamerah, “An Enhanced Focal Loss Function to Mitigate Class Imbalance in Auto Insurance Fraud Detection with Explainable AI,” Aug. 04, 2025, arXiv: arXiv:2508.02283. doi: 10.48550/arXiv.2508.02283.
[18] Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-Balanced Loss Based on Effective Number of Samples”.
[19] S. m T. I. Tonmoy, “Embeddings at BLP-2023 Task 2: Optimizing Fine-Tuned Transformers with Cost-Sensitive Learning for Multiclass Sentiment Analysis,” in Proceedings of the First Workshop on Bangla Language Processing (BLP-2023), F. Alam, S. Kar, S. A. Chowdhury, F. Sadeque, and R. Amin, Eds., Singapore: Association for Computational Linguistics, Dec. 2023, pp. 340–346. doi: 10.18653/v1/2023.banglalp-1.46.
[20] T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Q. Liu and D. Schlangen, Eds., Online: Association for Computational Linguistics, Oct. 2020, pp. 38–45. doi: 10.18653/v1/2020.emnlp-demos.6.
[21] S. F. Taskiran, B. Turkoglu, E. Kaya, and T. Asuroglu, “A comprehensive evaluation of oversampling techniques for enhancing text classification performance,” Sci Rep, vol. 15, no. 1, p. 21631, Jul. 2025, doi: 10.1038/s41598-025-05791-7.
[22] I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization.,” 2019.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Muhammad Ariyanto, Farrikh Alzami, Ramadhan Rakhmat Sani, Indra Gamayanto, Muhammad Naufal, Sri Winarno, Iswahyudi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








