Benchmarking Deepseek-LLM-7B-Chat and Qwen1.5-7B-Chat for Indonesian Product Review Emotion Classification

Authors

  • Galih Setiawan Nurohim Universitas Bina Sarana Informatika
  • Heribertus Ary Setyadi Universitas Bina Sarana Informatika
  • Ahmad Fauzi Universitas Bina Sarana Informatika

DOI:

https://doi.org/10.30871/jaic.v9i6.11369

Keywords:

DeepSeek, Emotion Classification, LLM, Qwen

Abstract

Upon completing their shopping experience on an e-commerce platform, users have the opportunity to leave a review. By analyzing reviews, businesses can gain insight into customer emotions, while researchers and policymakers can monitor social dynamics. Large Language Models (LLMs) utilization is identified as a promising methodology for emotion analysis. LLMs have revolutionized natural language processing capabilities, yet their performance in non-English languages, such as Indonesian, necessitates a comprehensive evaluation. This research objective is to perform a comprehensive analysis and comparison of Deepseek-LLM-7B-Chat and Qwen1.5-7B-Chat, two prominent open-source Large Language Models, for the emotion classification of Indonesian product reviews. By leveraging the PRDECT-ID dataset, this study evaluates the performance of both models in a few-shot learning scenario through prompt engineering. The methodology outlines the data preprocessing pipeline, a detailed few-shot prompt engineering strategy tailored to each model's characteristics, model inference execution, and performance assessment using the accuracy, precision, recall, and F1-score metrics. Analytical results reveal DeepSeek achieved an accuracy of 43.41%, exhibiting a considerably superior ability to comprehend instructions compared to Qwen, which attained a maximum accuracy of only 20.35% and often yielded near-random predictions. An in-depth error analysis indicates that this performance gap is likely attributable to factors such as pre-training data bias and tokenization mismatches with the Indonesian language. This research offers empirical evidence regarding the comparative strengths and weaknesses of DeepSeek and Qwen, providing a diagnostic benchmark that underscores the significance of instruction tuning and robust multilingual representation for Indonesian NLP tasks.

Downloads

Download data is not yet available.

References

[1] Khoirotulmuadiba Purifyregalia, Khothibul Umam, Nur Cahyo Hendro Wibowo, and Maya Rini Handayani, “Detecting Fake Reviews in E-Commerce: A Case Study on Shopee Using Support Vector Machine and Random Forest,” J. Appl. Informatics Comput., vol. 9, no. 3, pp. 955–965, 2025, doi: 10.30871/jaic.v9i3.9514.

[2] A. Daza, N. D. González Rueda, M. S. Aguilar Sánchez, W. F. Robles Espíritu, and M. E. Chauca Quiñones, “Sentiment Analysis on E-Commerce Product Reviews Using Machine Learning and Deep Learning Algorithms: A Bibliometric Analysisand Systematic Literature Review, Challenges and Future Works,” Int. J. Inf. Manag. Data Insights, vol. 4, no. 2, 2024, doi: 10.1016/j.jjimei.2024.100267.

[3] M. R. R. Rana, A. Nawaz, T. Ali, A. M. El-Sherbeeny, and W. Ali, “A BiLSTM-CF and BiGRU-based Deep Sentiment Analysis Model to Explore Customer Reviews for Effective Recommendations,” Eng. Technol. Appl. Sci. Res., vol. 13, no. 5, pp. 11739–11746, 2023, doi: 10.48084/etasr.6278.

[4] P. S. Ghatora, S. E. Hosseini, S. Pervez, M. J. Iqbal, and N. Shaukat, “Sentiment Analysis of Product Reviews Using Machine Learning and Pre-Trained LLM,” Big Data Cogn. Comput., vol. 8, no. 12, 2024, doi: 10.3390/bdcc8120199.

[5] Kirtika, “Intelligent Systems And Applications In Enhancing Sentiment Classification Accuracy of Amazon Product Reviews via NLP Approaches,” Int. J. Intell. Syst. Appl. Eng., vol. 12, no. 4, pp. 5752–5760, 2024, [Online]. Available: https://ijisae.org/index.php/IJISAE/article/view/7601

[6] N. Nabila and C. I. Ratnasari, “Topic Modeling of Skincare Comments from Female Daily,” J. Appl. Informatics Comput., vol. 9, no. 4, pp. 1394–1405, 2025, doi: 10.30871/jaic.v9i4.9625.

[7] P. Paul, S. Acharya, B. Misra, S. Majumder, N. Dey, and P. Pise, “Sentiment Analysis for E-Commerce Product Reviews Using CNN-LSTM,” 2024 1st Int. Conf. Women Comput. InCoWoCo 2024 - Proc., no. November 2024, pp. 14–15, 2024, doi: 10.1109/InCoWoCo64194.2024.10863425.

[8] M. A. Kausar, S. O. Fageeri, and A. Soosaimanickam, “Sentiment Classification based on Machine Learning Approaches in Amazon Product Reviews,” Eng. Technol. Appl. Sci. Res., vol. 13, no. 3, pp. 10849–10855, 2023, doi: 10.48084/etasr.5854.

[9] A. Godia and L. K. Tiwari, “Sentiment Analysis and Classification of Product Reviews: A Comprehensive Study Using NLP and Machine Learning Techniques,” 10th Int. Conf. Adv. Comput. Commun. Syst. ICACCS 2024, vol. 1, no. March 2024, pp. 1247–1252, 2024, doi: 10.1109/ICACCS60874.2024.10717296.

[10] O. Shobayo, S. Sasikumar, S. Makkar, and O. Okoyeigbo, “Customer Sentiments in Product Reviews: A Comparative Study with GooglePaLM,” Analytics, vol. 3, no. 2, pp. 241–254, 2024, doi: 10.3390/analytics3020014.

[11] K. A. F. A. Samah, N. F. A. Misdan, M. N. H. H. Jono, and L. S. Riza, “The Best Malaysian Airline Companies Visualization through Bilingual Twitter Sentiment Analysis: A Machine Learning Classification,” Int. J. Informatics Vis., vol. 6, no. 1, pp. 130–137, 2022, doi: 10.30630/joiv.6.1.879.

[12] J. O. Krugmann and J. Hartmann, “Sentiment Analysis in the Age of Generative AI,” Cust. Needs Solut., vol. 11, no. 1, 2024, doi: 10.1007/s40547-024-00143-4.

[13] Y. Mao, Q. Liu, and Y. Zhang, “Sentiment analysis methods, applications, and challenges: A systematic literature review,” J. King Saud Univ. - Comput. Inf. Sci., vol. 36, no. 4, p. 102048, 2024, doi: 10.1016/j.jksuci.2024.102048.

[14] P. Kumar and M. Kumar, “Review and Analysis of Product Review Sentiment Analysis using Improved Machine Learning Techniques,” Int. J. Recent Innov. Trends Comput. Commun., vol. 11, no. 10, pp. 946–951, 2023.

[15] M. Qorib, T. Oladunni, M. Denis, E. Ososanya, and P. Cotae, “Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset,” Expert Syst. Appl., vol. 212, no. January 2022, p. 118715, 2023, doi: 10.1016/j.eswa.2022.118715.

[16] A. Jazuli, Widowati, and R. Kusumaningrum, “Optimizing Aspect-Based Sentiment Analysis Using BERT for Comprehensive Analysis of Indonesian Student Feedback,” Appl. Sci., vol. 15, no. 1, pp. 1–28, 2025, doi: 10.3390/app15010172.

[17] H. Ahmadian, T. F. Abidin, H. Riza, and K. Muchtar, “Transformer-Based Indonesian Language Model for Emotion Classification and Sentiment Analysis,” Proceeding - Int. Conf. Inf. Technol. Comput. 2023, ICITCOM 2023, pp. 209–214, 2023, doi: 10.1109/ICITCOM60176.2023.10442970.

[18] Y. Liu and Y. Liu, “LLM-Driven Sentiment Analysis in MD & A : A Multi-Agent Framework for Corporate Misconduct Prediction,” System, vol. 13, no. 10, pp. 1–26, 2025, doi: https://doi.org/10.3390/systems13100839.

[19] Ö. Aydin, E. Karaarslan, and F. Safa ERENAY, “Generative AI in Academic Writing: A Comparison of DeepSeek, Qwen, ChatGPT, Gemini, Llama, Mistral, and Gemma,” 2025.

[20] W. Etaiwi and B. Alhijawi, “Comparative Evaluation of ChatGPT and DeepSeek Across Key NLP Tasks: Strengths, Weaknesses, and Domain-Specific Performance,” Array, vol. 27, no. August, p. 100478, 2025, doi: 10.1016/j.array.2025.100478.

[21] L. Xiong et al., “DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models,” IEEE/CAA J. Autom. Sin., vol. 12, no. 5, pp. 841–858, 2025, doi: 10.1109/JAS.2025.125495.

[22] D. Wang et al., “Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization,” pp. 1–19, 2024, [Online]. Available: http://arxiv.org/abs/2405.17067

[23] R. Sutoyo, S. Achmad, A. Chowanda, E. W. Andangsari, and S. M. Isa, “PRDECT-ID: Indonesian product reviews dataset for emotions classification tasks,” Data Br., vol. 44, p. 108554, Oct. 2022, doi: 10.1016/J.DIB.2022.108554.

[24] H. Akbar, D. Aryani, M. K. Mohammed Al-shammari, and M. B. Ulum, “Sentiment Analysis for E-Commerce Product Reviews Based on Feature Fusion and Bidirectional Long Short-Term Memory,” J. Tek. Inform., vol. 5, no. 5, pp. 1385–1391, 2024, doi: 10.52436/1.jutif.2024.5.5.2675.

[25] L. Liu, J. Meng, and Y. Yang, “LLM technologies and information search,” J. Econ. Technol., vol. 2, no. November, pp. 269–277, 2024, doi: 10.1016/j.ject.2024.08.007.

[26] H. Tohir, N. Merlina, and M. Haris, “Utilizing Retrieval-Augmented Generation in Large Language Models To Enhance Indonesian Language Nlp,” JITK (Jurnal Ilmu Pengetah. dan Teknol. Komputer), vol. 10, no. 2, pp. 352–360, 2024, doi: 10.33480/jitk.v10i2.5916.

[27] W. Q. Leong, J. G. Ngui, Y. Susanto, H. Rengarajan, K. Sarveswaran, and W. C. Tjhi, “BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models,” 2023, [Online]. Available: http://arxiv.org/abs/2309.06085

[28] Z. Zhou et al., “A Survey on Efficient Inference for Large Language Models,” pp. 1–36, 2024, [Online]. Available: http://arxiv.org/abs/2404.14294

[29] T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “QLORA: Efficient Finetuning of Quantized LLMs,” Adv. Neural Inf. Process. Syst., vol. 36, 2023.

Downloads

Published

2025-12-05

How to Cite

[1]
G. S. Nurohim, H. A. Setyadi, and A. Fauzi, “Benchmarking Deepseek-LLM-7B-Chat and Qwen1.5-7B-Chat for Indonesian Product Review Emotion Classification”, JAIC, vol. 9, no. 6, pp. 3068–3078, Dec. 2025.

Most read articles by the same author(s)

Similar Articles

<< < 15 16 17 18 19 > >> 

You may also start an advanced similarity search for this article.