Evaluating Machine Translation Models and LLMs for Indonesian–Javanese Translation Across Speech Levels
DOI:
https://doi.org/10.30871/jaic.v10i2.12326Keywords:
Javanese Translation, NLLB-200, M2M100, Gemini, Low-Resource LanguageAbstract
Despite being one of the most widely spoken regional languages in Indonesia, Javanese remains underrepresented in modern machine translation systems, particularly with respect to its hierarchical speech-level system. This study presents a comprehensive benchmarking of machine translation approaches for low-resource Indonesian-to-Javanese translation with explicit consideration of Javanese speech-level registers, namely Ngoko, Krama, and Krama Alus. We evaluate the effectiveness of two multilingual neural machine translation models, NLLB-200 and M2M100, under both zero-shot and supervised fine-tuning settings using a parallel corpus of approximately 4,000 sentence pairs from the Unggah-Ungguh dataset. Translation quality is assessed using BLEU, chrF++, METEOR, and BERTScore on both register-specific and overall test sets constructed from a balanced evaluation set of 1,500 sentence pairs (500 per register). Experimental results show that supervised fine-tuning substantially improves translation performance, with fine-tuned M2M100 achieving the strongest results among neural machine translation models. In addition, instruction-based translation using the Gemini large language model demonstrates superior overall performance, particularly in semantic-oriented metrics, highlighting its effectiveness under controlled instruction-based conditions within the scope of this experimental configuration. Overall, this study provides a reproducible and extensible evaluation framework for sociolinguistically informed machine translation of regional languages.
Downloads
References
[1] A. L. Tonja, O. Kolesnikova, A. Gelbukh, and G. Sidorov, “Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data,” Appl. Sci., vol. 13, no. 2, p. 1201, Jan. 2023, doi: 10.3390/app13021201.
[2] D. Rakhimova, A. Karibayeva, and A. Turarbek, “The Task of Post-Editing Machine Translation for the Low-Resource Language,” Appl. Sci., vol. 14, no. 2, p. 486, Jan. 2024, doi: 10.3390/app14020486.
[3] A. Javed et al., “Transformer-Based Re-Ranking Model for Enhancing Contextual and Syntactic Translation in Low-Resource Neural Machine Translation,” Electronics, vol. 14, no. 2, p. 243, Jan. 2025, doi: 10.3390/electronics14020243.
[4] K. Bhuvaneswari and M. Varalakshmi, “Efficient incremental training using a novel NMT-SMT hybrid framework for translation of low-resource languages,” Front. Artif. Intell., vol. 7, p. 1381290, Sep. 2024, doi: 10.3389/frai.2024.1381290.
[5] K. Huang, P. Li, J. Ma, T. Yao, and Y. Liu, “Knowledge Transfer in Incremental Learning for Multilingual Neural Machine Translation,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada: Association for Computational Linguistics, 2023, pp. 15286–15304. doi: 10.18653/v1/2023.acl-long.852.
[6] M. K. Pasupuleti, “Multilingual NLP for Low-Resource Languages Using Transfer Learning,” Int. J. Acad. Ind. Res. Innov., vol. 05, no. 05, pp. 452–461, May 2025, doi: 10.62311/nesx/rphcr7.
[7] Z. Kozhirbayev, “Enhancing Neural Machine Translation with Fine-Tuned mBART50 Pre-Trained Model: An Examination with Low-Resource Translation Pairs,” Ingénierie Systèmes Inf., vol. 29, no. 3, pp. 831–838, Jun. 2024, doi: 10.18280/isi.290304.
[8] P. W. Khoboko, V. Marivate, and J. Sefara, “Optimizing translation for low-resource languages: Efficient fine-tuning with custom prompt engineering in large language models,” Mach. Learn. Appl., vol. 20, p. 100649, Jun. 2025, doi: 10.1016/j.mlwa.2025.100649.
[9] X. Tan, Y. Ren, D. He, T. Qin, Z. Zhao, and T.-Y. Liu, “Multilingual Neural Machine Translation with Knowledge Distillation,” 2019, arXiv. doi: 10.48550/ARXIV.1902.10461.
[10] X.-P. Nguyen, S. Joty, W. Kui, and A. T. Aw, “Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model,” 2022, arXiv. doi: 10.48550/ARXIV.2205.15544.
[11] J. Pang et al., “Rethinking the Exploitation of Monolingual Data for Low-Resource Neural Machine Translation,” Comput. Linguist., vol. 50, no. 1, pp. 25–47, Mar. 2024, doi: 10.1162/coli_a_00496.
[12] S. Shi, X. Wu, R. Su, and H. Huang, “Low-resource Neural Machine Translation: Methods and Trends,” ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 21, no. 5, pp. 1–22, Sep. 2022, doi: 10.1145/3524300.
[13] T. O. Tafa et al., “Machine Translation Performance for Low-Resource Languages: A Systematic Literature Review,” IEEE Access, vol. 13, pp. 72486–72505, 2025, doi: 10.1109/ACCESS.2025.3562918.
[14] S. Ranathunga, E.-S. A. Lee, M. P. Skenduli, R. Shekhar, M. Alam, and R. Kaur, “Neural Machine Translation for Low-Resource Languages: A Survey,” 2021, arXiv. doi: 10.48550/ARXIV.2106.15115.
[15] B. Haddow, R. Bawden, A. V. M. Barone, J. Helcl, and A. Birch, “Survey of Low-Resource Machine Translation,” Comput. Linguist., vol. 48, no. 3, pp. 673–732, Sep. 2022, doi: 10.1162/coli_a_00446.
[16] F. Guzmán et al., “The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali–English and Sinhala–English,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China: Association for Computational Linguistics, 2019, pp. 6097–6110. doi: 10.18653/v1/D19-1632.
[17] S. H. Asefa and Y. Assabie, “Transformer-Based Amharic-to-English Machine Translation With Character Embedding and Combined Regularization Techniques,” IEEE Access, vol. 13, pp. 1090–1105, 2025, doi: 10.1109/ACCESS.2024.3521985.
[18] G. Team et al., “Gemini: A Family of Highly Capable Multimodal Models,” May 09, 2025, arXiv: arXiv:2312.11805. doi: 10.48550/arXiv.2312.11805. 85.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Mahendra Bayu Prayoga, Bagas Restya Ermawan, Akmal Rafi Fadhillah, Mohammad Nizar Farizi, Ema Utami

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








