Comparison of Text Vectorization Methods for IMDB Movie Review Sentiment Analysis Using SVM
DOI:
https://doi.org/10.30871/jaic.v9i5.10372Keywords:
Bag of Words, Support Vector Machine, Sentiment Analysis, Doc2Vec, TF-IDFAbstract
Sentiment Analysis is a scientific study in the field of Machine Learning that focuses on classifying opinions expressed in text. IMDb is a platform widely used to provide information and share viewpoints among moviegoers worldwide, where audience reactions often serve as a benchmark for a movie’s success. This research aims to classify positive and negative sentiments by applying and evaluating the effectiveness of Support Vector Machine (SVM) with four different feature representation methods: (a) Bag of Words (BoW), (b) TF-IDF, (c) Word2Vec, and (d) Doc2Vec. After preprocessing the textual data, each method was employed to extract features for model training. The experimental results demonstrate that the combination of SVM with Word2Vec achieved the best overall performance with an F1-Score of 0.8607 and an Accuracy of 0.8607, while also being the fastest in training time (75.0s). In comparison, BoW reached an F1-Score of 0.8219, TF-IDF achieved 0.8520, and Doc2Vec obtained 0.8440. These findings highlight that Word2Vec provides the most effective feature representation for sentiment classification using SVM in this study.
Downloads
References
[1] A. Salinca, “Business Reviews Classification Using Sentiment Analysis,” Proceedings - 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2015, pp. 247–250, 2016, doi: 10.1109/SYNASC.2015.46.
[2] N. M. Sharef, H. M. Zin, and S. Nadali, “Overview and future opportunities of Sentiment Analysis approaches for big data,” Journal of Computer Science, vol. 12, no. 3, pp. 153–168, 2016, doi: 10.3844/jcssp.2016.153.168.
[3] M. Taboada, “Sentiment Analysis: An Overview from Linguistics,” Annu Rev Linguist, vol. 2, no. September, pp. 325–347, 2016, doi: 10.1146/annurev-linguistics-011415-040518.
[4] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment Classification using Machine Learning Techniques,” 2002, doi: 10.3115/1118693.1118704.
[5] T. Mullen and N. Collier, “Incorporating topic information into sentiment analysis models,” pp. 25-es, 2004, doi: 10.3115/1219044.1219069.
[6] C. E. Osgood, G. J. Suci, and P. H. Tannenbaum, “The Measurement of Meaning [by] Charles E. Osgood, George J. Suci [and] Percy H. Tannenbaum,” 1964.
[7] C. K. Wang, “Sentiment Analysis Using Support Vector Machines, Neural Networks, and Random Forests,” 2023, pp. 23–34. doi: 10.2991/978-94-6463-300-9_4.
[8] D. Subedi, Nabin Lamichhane, and N. Subedi, “Sentiment Analysis of IMDb Movie Reviews Using SVM and Naive Bayes Classifier,” Journal of Engineering and Sciences, vol. 4, no. 1, pp. 56–68, May 2025, doi: 10.3126/jes2.v4i1.70138.
[9] G. Cahyani, W. Widayani, S. D. Anggita, Y. Pristyanto, I. Ikmah, and A. Sidauruk, “Klasifikasi Data Review IMDb Berdasarkan Analisis Sentimen Menggunakan Algoritma Support Vector Machine,” Jurnal Media Informatika Budidarma, vol. 6, no. 3, p. 1418, Jul. 2022, doi: 10.30865/mib.v6i3.4023.
[10] B. Das and S. Chakraborty, “An Improved Text Sentiment Classification Model Using TF-IDF and Next Word Negation,” 2018.
[11] S. Al-Saqqa and A. Awajan, “The Use of Word2vec Model in Sentiment Analysis: A Survey,” PervasiveHealth: Pervasive Computing Technologies for Healthcare, pp. 39–43, 2019, doi: 10.1145/3388218.3388229.
[12] G. Liu and X. Wu, “Using collaborative filtering algorithms combined with Doc2Vec for movie recommendation,” Proceedings of 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference, ITNEC 2019, no. Itnec, pp. 1461–1464, 2019, doi: 10.1109/ITNEC.2019.8729076.
[13] L. Q. Trieu, H. Q. Tran, and M. T. Tran, “News classification from social media using Twitter-based Doc2Vec model and automatic query expansion,” ACM International Conference Proceeding Series, vol. 2017-Decem, pp. 460–467, 2017, doi: 10.1145/3155133.3155206.
[14] C. Z. Liu, Y. X. Sheng, Z. Q. Wei, and Y. Q. Yang, “Research of Text Classification Based on Improved TF-IDF Algorithm,” 2018 IEEE International Conference of Intelligent Robotic and Control Engineering, IRCE 2018, no. 2, pp. 69–73, 2018, doi: 10.1109/IRCE.2018.8492945.
[15] J. Zhou, Z. Ye, S. Zhang, Z. Geng, N. Han, and T. Yang, “Investigating response behavior through TF-IDF and Word2vec text analysis: A case study of PISA 2012 problem-solving process data,” Heliyon, vol. 10, no. 16, Aug. 2024, doi: 10.1016/j.heliyon.2024.e35945.
[16] D. Dessi, R. Helaoui, V. Kumar, D. R. Recupero, and D. Riboni, “TF-IDF vs Word Embeddings for Morbidity Identification in Clinical Notes: An Initial Study,” Jun. 2021, doi: 10.5281/zenodo.4777594.
[17] K. Kumar, B. S. Harish, and H. K. Darshan, “Sentiment Analysis on IMDb Movie Reviews Using Hybrid Feature Extraction Method,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 5, p. 109, 2019, doi: 10.9781/ijimai.2018.12.005.
[18] Z. Gharibshah, X. Zhu, A. Hainline, and M. Conway, “Deep Learning for User Interest and Response Prediction in Online Display Advertising,” Data Sci Eng, vol. 5, no. 1, pp. 12–26, 2020, doi: 10.1007/s41019-019-00115-y.
[19] Y. HaCohen-Kerner, D. Miller, and Y. Yigal, “The influence of preprocessing on text classification using a bag-of-words representation,” PLoS One, vol. 15, no. 5, pp. 1–22, 2020, doi: 10.1371/journal.pone.0232525.
[20] A. Khan, B. Baharudin, L. H. Lee, and K. Khan, “Journal of Advances in Information Technology,” Journal of Advances in Information Technology, vol. 1, no. 1, p. 1, 2010.
[21] H. Brücher, G. Knolmayer, and M.-A. Mittermayer, “Document Classification Methods for Organizing Explicit Knowledge,” CiteSeer, vol. 41, no. 140, pp. 1–26, 2002.
[22] W. Bourequat and H. Mourad, “Sentiment Analysis Approach for Analyzing iPhone Release using Support Vector Machine,” International Journal of Advances in Data and Information Systems, vol. 2, no. 1, pp. 36–44, 2021, doi: 10.25008/ijadis.v2i1.1216.
[23] D. Setyawan and E. Winarko, “Analisis Opini Terhadap Fitur Smartphone Pada Ulasan Website Berbahasa Indonesia,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 10, no. 2, p. 183, 2016, doi: 10.22146/ijccs.17485.
[24] S. Vijayarani, M. J. Ilamathi, M. Nithya, A. Professor, and M. P. Research Scholar, “Preprocessing Techniques for Text Mining -An Overview,” vol. 5, no. 1, pp. 7–16.
[25] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings, pp. 1–12, 2013.
[26] X. Rong, “word2vec Parameter Learning Explained,” pp. 1–21, 2014.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Rifqi Mulyawan, Husni Naparin, Wifda Muna Fatihia

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








