Comparison of Text Vectorization Methods for IMDB Movie Review Sentiment Analysis Using SVM

Rifqi Mulyawan; Husni Naparin; Wifda Muna Fatihia

doi:10.30871/jaic.v9i5.10372

Authors

Rifqi Mulyawan UIN Antasari Banjarmasin
Husni Naparin UIN Antasari Banjarmasin
Wifda Muna Fatihia UIN Antasari Banjarmasin

DOI:

https://doi.org/10.30871/jaic.v9i5.10372

Keywords:

Bag of Words, Support Vector Machine, Sentiment Analysis, Doc2Vec, TF-IDF

Abstract

Sentiment Analysis is a scientific study in the field of Machine Learning that focuses on classifying opinions expressed in text. IMDb is a platform widely used to provide information and share viewpoints among moviegoers worldwide, where audience reactions often serve as a benchmark for a movie’s success. This research aims to classify positive and negative sentiments by applying and evaluating the effectiveness of Support Vector Machine (SVM) with four different feature representation methods: (a) Bag of Words (BoW), (b) TF-IDF, (c) Word2Vec, and (d) Doc2Vec. After preprocessing the textual data, each method was employed to extract features for model training. The experimental results demonstrate that the combination of SVM with Word2Vec achieved the best overall performance with an F1-Score of 0.8607 and an Accuracy of 0.8607, while also being the fastest in training time (75.0s). In comparison, BoW reached an F1-Score of 0.8219, TF-IDF achieved 0.8520, and Doc2Vec obtained 0.8440. These findings highlight that Word2Vec provides the most effective feature representation for sentiment classification using SVM in this study.

Downloads

Download data is not yet available.

References

[1] A. Salinca, “Business Reviews Classification Using Sentiment Analysis,” Proceedings - 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2015, pp. 247–250, 2016, doi: 10.1109/SYNASC.2015.46.

[2] N. M. Sharef, H. M. Zin, and S. Nadali, “Overview and future opportunities of Sentiment Analysis approaches for big data,” Journal of Computer Science, vol. 12, no. 3, pp. 153–168, 2016, doi: 10.3844/jcssp.2016.153.168.

[3] M. Taboada, “Sentiment Analysis: An Overview from Linguistics,” Annu Rev Linguist, vol. 2, no. September, pp. 325–347, 2016, doi: 10.1146/annurev-linguistics-011415-040518.

[4] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment Classification using Machine Learning Techniques,” 2002, doi: 10.3115/1118693.1118704.

[5] T. Mullen and N. Collier, “Incorporating topic information into sentiment analysis models,” pp. 25-es, 2004, doi: 10.3115/1219044.1219069.

[6] C. E. Osgood, G. J. Suci, and P. H. Tannenbaum, “The Measurement of Meaning [by] Charles E. Osgood, George J. Suci [and] Percy H. Tannenbaum,” 1964.

[7] C. K. Wang, “Sentiment Analysis Using Support Vector Machines, Neural Networks, and Random Forests,” 2023, pp. 23–34. doi: 10.2991/978-94-6463-300-9_4.

[8] D. Subedi, Nabin Lamichhane, and N. Subedi, “Sentiment Analysis of IMDb Movie Reviews Using SVM and Naive Bayes Classifier,” Journal of Engineering and Sciences, vol. 4, no. 1, pp. 56–68, May 2025, doi: 10.3126/jes2.v4i1.70138.

[9] G. Cahyani, W. Widayani, S. D. Anggita, Y. Pristyanto, I. Ikmah, and A. Sidauruk, “Klasifikasi Data Review IMDb Berdasarkan Analisis Sentimen Menggunakan Algoritma Support Vector Machine,” Jurnal Media Informatika Budidarma, vol. 6, no. 3, p. 1418, Jul. 2022, doi: 10.30865/mib.v6i3.4023.

[10] B. Das and S. Chakraborty, “An Improved Text Sentiment Classification Model Using TF-IDF and Next Word Negation,” 2018.

[11] S. Al-Saqqa and A. Awajan, “The Use of Word2vec Model in Sentiment Analysis: A Survey,” PervasiveHealth: Pervasive Computing Technologies for Healthcare, pp. 39–43, 2019, doi: 10.1145/3388218.3388229.

[12] G. Liu and X. Wu, “Using collaborative filtering algorithms combined with Doc2Vec for movie recommendation,” Proceedings of 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference, ITNEC 2019, no. Itnec, pp. 1461–1464, 2019, doi: 10.1109/ITNEC.2019.8729076.

[13] L. Q. Trieu, H. Q. Tran, and M. T. Tran, “News classification from social media using Twitter-based Doc2Vec model and automatic query expansion,” ACM International Conference Proceeding Series, vol. 2017-Decem, pp. 460–467, 2017, doi: 10.1145/3155133.3155206.

[14] C. Z. Liu, Y. X. Sheng, Z. Q. Wei, and Y. Q. Yang, “Research of Text Classification Based on Improved TF-IDF Algorithm,” 2018 IEEE International Conference of Intelligent Robotic and Control Engineering, IRCE 2018, no. 2, pp. 69–73, 2018, doi: 10.1109/IRCE.2018.8492945.

[15] J. Zhou, Z. Ye, S. Zhang, Z. Geng, N. Han, and T. Yang, “Investigating response behavior through TF-IDF and Word2vec text analysis: A case study of PISA 2012 problem-solving process data,” Heliyon, vol. 10, no. 16, Aug. 2024, doi: 10.1016/j.heliyon.2024.e35945.

[16] D. Dessi, R. Helaoui, V. Kumar, D. R. Recupero, and D. Riboni, “TF-IDF vs Word Embeddings for Morbidity Identification in Clinical Notes: An Initial Study,” Jun. 2021, doi: 10.5281/zenodo.4777594.

[17] K. Kumar, B. S. Harish, and H. K. Darshan, “Sentiment Analysis on IMDb Movie Reviews Using Hybrid Feature Extraction Method,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 5, p. 109, 2019, doi: 10.9781/ijimai.2018.12.005.

[18] Z. Gharibshah, X. Zhu, A. Hainline, and M. Conway, “Deep Learning for User Interest and Response Prediction in Online Display Advertising,” Data Sci Eng, vol. 5, no. 1, pp. 12–26, 2020, doi: 10.1007/s41019-019-00115-y.

[19] Y. HaCohen-Kerner, D. Miller, and Y. Yigal, “The influence of preprocessing on text classification using a bag-of-words representation,” PLoS One, vol. 15, no. 5, pp. 1–22, 2020, doi: 10.1371/journal.pone.0232525.

[20] A. Khan, B. Baharudin, L. H. Lee, and K. Khan, “Journal of Advances in Information Technology,” Journal of Advances in Information Technology, vol. 1, no. 1, p. 1, 2010.

[21] H. Brücher, G. Knolmayer, and M.-A. Mittermayer, “Document Classification Methods for Organizing Explicit Knowledge,” CiteSeer, vol. 41, no. 140, pp. 1–26, 2002.

[22] W. Bourequat and H. Mourad, “Sentiment Analysis Approach for Analyzing iPhone Release using Support Vector Machine,” International Journal of Advances in Data and Information Systems, vol. 2, no. 1, pp. 36–44, 2021, doi: 10.25008/ijadis.v2i1.1216.

[23] D. Setyawan and E. Winarko, “Analisis Opini Terhadap Fitur Smartphone Pada Ulasan Website Berbahasa Indonesia,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 10, no. 2, p. 183, 2016, doi: 10.22146/ijccs.17485.

[24] S. Vijayarani, M. J. Ilamathi, M. Nithya, A. Professor, and M. P. Research Scholar, “Preprocessing Techniques for Text Mining -An Overview,” vol. 5, no. 1, pp. 7–16.

[25] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings, pp. 1–12, 2013.

[26] X. Rong, “word2vec Parameter Learning Explained,” pp. 1–21, 2014.

Comparison of Text Vectorization Methods for IMDB Movie Review Sentiment Analysis Using SVM

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

submit

tools

issn