Comprehensive Comparison of TF-IDF and Word2Vec in Product Sentiment Classification Using Machine Learning Models

Authors

  • Asra Gretya Sinaga Informatics Engineering, STMIK TIME
  • Robet Robet Informatics Engineering, STMIK TIME
  • Octara Pribadi Informatics Engineering, STMIK TIME

DOI:

https://doi.org/10.30871/jaic.v10i1.11582

Keywords:

Machine Learning, Sentiment Analysis, TF-IDF, Word2Vec, Product Reviews

Abstract

Sentiment analysis supports data-driven decisions by turning product reviews into reliable polarity labels. We compare four text representations, TF-IDF, TF-IDF reduced via SVD, Word2Vec (trained from scratch), and a hybrid TF-IDF(SVD-300). Word2Vec, for sentiment classification of Indonesian Shopee product reviews from Kaggle (~2.5k texts). After normalization (with optional emoji handling and Indonesian stemming), ratings are mapped to binary sentiment (≤2 negative, ≥4 positive; 3 discarded). Each representation is evaluated with Logistic Regression, Support Vector Machines (linear/RBF), Naive Bayes, and Random Forest under stratified 5-fold cross-validation. TF-IDF with Logistic Regression (C=1.0) yields the best results (F1-macro = 0.816 ± 0.026; Accuracy = 0.816 ± 0.026), with LinearSVC as a strong runner-up. Word2Vec (scratch) performs lower, consistent with limited data being insufficient to learn stable embeddings, while the hybrid representation offers only modest gains over Word2Vec and does not surpass TF-IDF. These findings indicate that TF-IDF is the most reliable and consistent representation for small, short-text review datasets, and they underscore the impact of feature design on downstream classification performance.

Downloads

Download data is not yet available.

References

[1] W. Zhao, F. Hu, J. Wang, T. Shu, and Y. Xu, “A systematic literature review on social commerce: Assessing the past and guiding the future,” Electron. Commer. Res. Appl., vol. 57, 2023, doi: 10.1016/j.elerap.2022.101219.

[2] L. A. Huwaida et al., “Generation Z and Indonesian Social Commerce: Unraveling key drivers of their shopping decisions,” J. Open Innov. Technol. Mark. Complex., vol. 10, no. 2, 2024, doi: 10.1016/j.joitmc.2024.100256.

[3] D. A. Agustina, S. Subanti, and E. Zukhronah, “Implementasi Text Mining Pada Analisis Sentimen Pengguna Twitter Terhadap Marketplace di Indonesia Menggunakan Algoritma Support Vector Machine,” Indones. J. Appl. Stat., vol. 3, no. 2, p. 109, 2021, doi: 10.13057/ijas.v3i2.44337.

[4] A. Daza, N. D. González Rueda, M. S. Aguilar Sánchez, W. F. Robles Espíritu, and M. E. Chauca Quiñones, “Sentiment Analysis on E-Commerce Product Reviews Using Machine Learning and Deep Learning Algorithms: A Bibliometric Analysisand Systematic Literature Review, Challenges and Future Works,” Int. J. Inf. Manag. Data Insights, vol. 4, no. 2, 2024, doi: 10.1016/j.jjimei.2024.100267.

[5] C. D. Sasongko, R. Isnanto, and A. P. Widodo, “Review of Systematic Literature about Sentiment Analysis Techniques,” J. Sist. Inf. Bisnis, vol. 15, no. 2, pp. 227–236, 2025, doi: 10.14710/vol15iss2pp227-236.

[6] B. Bansal and S. Srivastava, “Sentiment classification of online consumer reviews using word vector representations,” Procedia Comput. Sci., vol. 132, pp. 1147–1153, 2018, doi: 10.1016/j.procs.2018.05.029.

[7] J. A. Aguilar-Moreno, P. R. Palos-Sanchez, and R. Pozo-Barajas, “Sentiment analysis to support business decision-making. A bibliometric study,” AIMS Math., vol. 9, no. 2, pp. 4337–4375, 2024, doi: 10.3934/math.2024215.

[8] Dr. Kochuthresia Jose, “the Rise of Social Commerce: Analyzing Consumer Buying Behavior on Social Media,” Intersecting Realms New Dimens. Multidiscip. Res. Vol., 2020, doi: 10.25215/9358091800.16.

[9] C. H. Yutika, A. Adiwijaya, and S. Al Faraby, “Analisis Sentimen Berbasis Aspek pada Review Female Daily Menggunakan TF-IDF dan Naïve Bayes,” J. Media Inform. Budidarma, vol. 5, no. 2, p. 422, 2021, doi: 10.30865/mib.v5i2.2845.

[10] H. Wang, “Word2Vec and SVM Fusion for Advanced Sentiment Analysis on Amazon Reviews,” Highlights Sci. Eng. Technol., vol. 85, pp. 743–749, 2024, doi: 10.54097/sw4pft19.

[11] R. Mukarramah, D. Atmajaya, and L. B. Ilmawan, “Performance comparison of support vector machine (SVM) with linear kernel and polynomial kernel for multiclass sentiment analysis on twitter,” Ilk. J. Ilm., vol. 13, no. 2, pp. 168–174, 2021, doi: 10.33096/ilkom.v13i2.851.168-174.

[12] J. Lu, “Text vectorization in sentiment analysis: A comparative study of TF-IDF and Word2Vec from Amazon Fine Food Reviews,” ITM Web Conf., vol. 70, p. 03001, 2025, doi: 10.1051/itmconf/20257003001.

[13] P. S. Ghatora, S. E. Hosseini, S. Pervez, M. J. Iqbal, and N. Shaukat, “Sentiment Analysis of Product Reviews Using Machine Learning and Pre-Trained LLM,” Big Data Cogn. Comput., vol. 8, no. 12, 2024, doi: 10.3390/bdcc8120199.

[14] A. Mabrouk, R. P. D. Redondo, and M. Kayed, “Deep Learning-Based Sentiment Classification: A Comparative Survey,” IEEE Access, vol. 8, pp. 85616–85638, 2020, doi: 10.1109/ACCESS.2020.2992013.

[15] C. Apriansyah Hutagalung and V. Budi Lestari, “Data Mining Approach: K-Means Clustering and Naïve Bayes Classifier for Graduate Quality Analysis,” J-KOMA J. Ilmu Komput. dan Apl., vol. 8, no. 1, pp. 33–42, 2025, doi: 10.21009/j-koma.v8i1.05.

[16] M. Das, S. Kamalanathan, and P. Alphonse, “A Comparative Study on TF-IDF feature weighting method and its analysis using unstructured dataset,” CEUR Workshop Proc., vol. 2870, pp. 98–107, 2021.

[17] Z. A. Khan and V. Rekha, “Fake News Detection Using TF-IDF Weighted with Word2Vec: An Ensemble Approach,” Int. J. Intell. Syst. Appl. Eng., vol. 11, no. 3, pp. 1065–1076, 2023.

[18] R. Ahamad and K. N. Mishra, “Exploring sentiment analysis in handwritten and E-text documents using advanced machine learning techniques: a novel approach,” J. Big Data, vol. 12, no. 1, 2025, doi: 10.1186/s40537-025-01064-2.

[19] R. Mulyawan, H. Naparin, and W. M. Fatihia, “Comparison of Text Vectorization Methods for IMDB Movie Review Sentiment Analysis Using SVM,” J. Appl. Informatics Comput., vol. 9, no. 5, pp. 2270–2277, 2025.

[20] M. Z. Zainottah, R. S. Rengga, Y. S. Yustian, and I. R. Isa, “Critical Sentiment Analysis of Tokopedia Electronic Products Using SVM-Logistic & TF-IDF Ensemble Methods,” J. Artif. Intell. Eng. Appl., vol. 4, no. 3, pp. 2476–2482, 2025, doi: 10.59934/jaiea.v4i3.1194.

Downloads

Published

2026-02-04

How to Cite

[1]
A. G. Sinaga, R. Robet, and O. Pribadi, “Comprehensive Comparison of TF-IDF and Word2Vec in Product Sentiment Classification Using Machine Learning Models”, JAIC, vol. 10, no. 1, pp. 184–191, Feb. 2026.

Similar Articles

<< < 48 49 50 

You may also start an advanced similarity search for this article.