Sentiment Classification Analysis of Tokopedia Reviews Using TF-IDF, SMOTE, and Traditional Machine Learning Models

Authors

  • Herianta Barus Universitas Amikom Yogyakarta
  • Ika Nur Fajri Universitas Amikom Yogyakarta
  • Yoga Pristyanto Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.30871/jaic.v9i5.10524

Keywords:

Sentiment Analysis, Tokopedia, E-commerce, Naïve Bayes, Random Forest, Support Vector Machine (SVM), Logistic Regression, Decision Tree, TF-IDF, SMOTE

Abstract

This study explores sentiment classification on Tokopedia user reviews using TF-IDF for feature extraction and SMOTE to handle class imbalance. From nearly one million raw reviews sourced from Kaggle ("E-Commerce Ratings and Reviews in Bahasa Indonesia"), a final set of 6,477 relevant entries was obtained after rigorous preprocessing, including case folding, noise removal (emojis, URLs, numbers), normalization to KBBI standards, tokenization, stopword removal, and stemming with Sastrawi. The dataset consisted of 5,213 positive and 1,264 negative reviews (80.4% positive). SMOTE balanced the classes to 10,426 reviews with a 1:1 ratio for training. Five traditional machine learning models were evaluated: Naive Bayes, Logistic Regression, Support Vector Machine (SVM), Decision Tree, and Random Forest. Assessments were based on accuracy, precision, recall, F1-score, ROC-AUC, and computational time, using an 80:20 stratified split and 5-fold cross-validation. Random Forest achieved the best overall performance (accuracy: 0.9163, F1-score: 0.9133, ROC-AUC: 0.9784), while tuned SVM (C=10, RBF kernel) attained the highest accuracy of 0.9473 and F1-score of 0.9321. Cross-validation on Naive Bayes showed consistent results with an average accuracy of 88.09%. Further analysis using Logistic Regression coefficients identified influential features: positive sentiment associated with words like "mantap", "mudah", and "sukses", while negative sentiment correlated with "kecewa", "parah", and "lemot". These insights provide practical value for Tokopedia's teams to enhance user experience, such as improving app speed and addressing complaints. The findings demonstrate the effectiveness and efficiency of traditional machine learning techniques for sentiment analysis in Bahasa Indonesia contexts.

Downloads

Download data is not yet available.

References

[1] R. A. E. V. T. Sapanji, D. Hamdani, and P. Harahap, “Sentiment Analysis of the Top 5 E-commerce Platforms in Indonesia using Text Mining and Natural Language Processing (NLP),” Journal of Applied Informatics and Computing, vol. 7, no. 2, pp. 202–211, Nov. 2023, doi: 10.30871/jaic.v7i2.6517.

[2] S. Setyabudi and E. Aryanny, “Sentiment Analysis of Lazada Marketplace User Ratings with Naïve Bayes and Support Vector Machine Methods,” INOVTEK Polbeng - Seri Informatika, vol. 10, no. 1, pp. 422–433, Mar. 2025, doi: 10.35314/sww8cg21.

[3] T. Ernayanti, M. Mustafid, A. Rusgiyono, and A. R. Hakim, “Penggunaan Seleksi Fitur Chi-Square Dan Algoritma Multinomial Naïve Bayes Untuk Analisis Sentimen Pelangggan Tokopedia,” Jurnal Gaussian, vol. 11, no. 4, pp. 562–571, Feb. 2023, doi: 10.14710/j.gauss.11.4.562-571.

[4] P. Subarkah, P. W. Rahayu, I. Darmayanti, and R. Riyanto, “Sentiment Analysis On Reviews Of Women’s Tops On Shopee Marketplace Using Naive Bayes Algorithm,” JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer), vol. 9, no. 1, pp. 126–133, Aug. 2023, doi: 10.33480/jitk.v9i1.4179.

[5] S. K. M. T. Ph. D. D. E. Sondakh, S. W. Taju, M. G. Tene, and A. E. T. Pangaila, “Sistem Analisis Sentimen Ulasan Aplikasi Belanja Online Menggunakan Metode Ensemble Learning,” CogITo Smart Journal, vol. 9, no. 2, pp. 280–291, Dec. 2023, doi: 10.31154/cogito.v9i2.525.280-291.

[6] K. Hasanah, “Comparison of Sentiment Analysis Model for Shopee Comments on Google Play Store,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 13, no. 1, pp. 21–30, Feb. 2024, doi: 10.32736/sisfokom.v13i1.1916.

[7] R. Kurniawan and Y. Arie Wijaya, “Jurnal Informatika dan Rekayasa Perangkat Lunak Analisis Data Sentimen Ulasan Pengguna Aplikasi Shopee di Google Play Store dengan Klasifikasi Algoritma Naïve Bayes,” Jurnal Informatika dan Rekayasa Perangkat Lunak, vol. 6, no. 1, pp. 164–170, Mar. 2024.

[8] S. N. Fadhilah and F. S. Utomo, “Naïve Bayes Algorithm for Sentiment Analysis of Blibli.com Review on Google Play Store,” SISTEMASI, vol. 13, no. 2, p. 831, Mar. 2024, doi: 10.32520/stmsi.v13i2.3887.

[9] I. S. K. Idris, Y. A. Mustofa, and I. A. Salihi, “Analisis Sentimen Terhadap Penggunaan Aplikasi Shopee Mengunakan Algoritma Support Vector Machine (SVM),” Jambura Journal of Electrical and Electronics Engineering, vol. 5, no. 1, pp. 32–35, Jan. 2023, doi: 10.37905/jjeee.v5i1.16830.

[10] Wildan Nugraha and Mardi Hardjianto, “Optimasi Support Vector Machines (SVM) Menggunakan Particle Swarm Optimization (PSO) pada Analisis Sentimen Ulasan Pengguna Layanan Bukalapak di Twitter,” Decode: Jurnal Pendidikan Teknologi Informasi, vol. 4, no. 3, pp. 1082–1100, Nov. 2024, doi: 10.51454/decode.v4i3.686.

[11] P. S. Hutapea and W. Maharani, “Sentiment Analysis on Twitter Social Media towards Shopee E-Commerce through Support Vector Machine (SVM) Method,” JINAV: Journal of Information and Visualization, vol. 4, no. 1, pp. 7–17, Jan. 2023, doi: 10.35877/454RI.jinav1504.

[12] P. Wahyuni and Moh. A. Romli, “Comparison of Naïve Bayes Classifier and Decision Tree Algorithms for Sentiment Analysis on the House of Representatives’ Right of Inquiry on Twitter,” Journal of Applied Informatics and Computing, vol. 8, no. 2, pp. 523–530, Nov. 2024, doi: 10.30871/jaic.v8i2.8670.

[13] A. D. Afan Firdaus, R. D. Rahmawan, Y. R. Mahendra, and H. D. Cahyono, “Sentiment Analysis Classification In Women’s E-Commerce Reviews With Machine Learning Approach,” Jurnal Teknik Informatika (Jutif), vol. 5, no. 6, pp. 1549–1559, Dec. 2024, doi: 10.52436/1.jutif.2024.5.6.2392.

[14] M. R. Nurhusen, J. Indra, and K. A. Baihaqi, “Analisis Sentimen Pengguna Twitter Terhadap Kenaikan Harga Bahan Bakar Minyak (BBM) Menggunakan Metode Logistic Regression,” Jurnal Media Informatika Budidarma, vol. 7, no. 1, p. 276, Jan. 2023, doi: 10.30865/mib.v7i1.5491.

[15] R. T. Agustin, Y. Cahyana, K. A. Baihaqi, and T. Rohana, “Public Sentiment Analysis on the Boycott Israel Movement on Platform X Using Random Forest and Logistic Regression Algorithms,” Journal of Applied Informatics and Computing, vol. 9, no. 3, pp. 938–945, Jun. 2025, doi: 10.30871/jaic.v9i3.9551.

[16] P. Sidik, I Made Gede Sunarya, and I Gede Aris Gunadi, “Comparison of Random Forest and Support Vector Machine Methods in Sentiment Analysis of Student Satisfaction Questionnaire Comments at ITB STIKOM Bali,” Journal of Applied Informatics and Computing, vol. 9, no. 3, pp. 794–802, Jun. 2025, doi: 10.30871/jaic.v9i3.9617.

Downloads

Published

2025-10-14

How to Cite

[1]
H. Barus, I. N. Fajri, and Y. Pristyanto, “Sentiment Classification Analysis of Tokopedia Reviews Using TF-IDF, SMOTE, and Traditional Machine Learning Models”, JAIC, vol. 9, no. 5, pp. 2552–2561, Oct. 2025.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.