Effectiveness of AdaBoost and XGBoost Algorithms in Sentiment Analysis of Movie Reviews
DOI:
https://doi.org/10.30871/jaic.v9i2.9077Keywords:
AdaBoost, Classification, IMDb reviews dataset, Sentiment analysis, XGBoostAbstract
Currently there are many entertainment platforms that provide various movies, TV shows, games, and other content. These platforms usually offer a variety of features, one of which is reviews. Review data written by viewers plays an important role in influencing public interest in the film. However, the increasing number of reviews makes it difficult to assess the sentiment of the film quickly and accurately. This highlights the need for a system that can analyze reviews based on sentiment, making it easier for viewers to evaluate the film and supporting the entertainment industry in understanding the needs of the audience. Therefore, this study develops a sentiment analysis model to identify whether a review contains positive or negative sentiment using machine learning algorithms. The data used to build the model is obtained from user reviews of a film on the IMDb platform. This dataset is available on Kaggle with 50,000 movie reviews in text format. The characteristics of the data include two columns: review_text and sentiment. The methods used to create the classification model are AdaBoost and XGBoost. The data preprocessing process includes several stages such as text cleaning, tokenization, stopword removal, lemmatization, and vectorization using TF-IDF to convert the review text into numeric form, as well as converting the positive and negative labels into 1 and 0. Based on the results of model training with cross-validation, the accuracy of the XGBoost model is 85% and AdaBoost is 77%. Feature selection showed an improvement in the XGBoost model's accuracy from 85% to 86%, while the AdaBoost model's performance remained stable at 77%. Thus, it can be concluded that the XGBoost model demonstrates better performance than the AdaBoost model in sentiment classification.
Downloads
References
[1] K. Lu and J. Wu, “Sentiment analysis of film review texts based on sentiment dictionary and SVM,” ACM Int. Conf. Proceeding Ser., vol. Part F1481, pp. 73–77, 2019, doi: 10.1145/3319921.3319966.
[2] K. K. Singh, J. Makhania, and M. Mahapatra, “Impact of ratings of content on OTT platforms and prediction of its success rate,” Multimed. Tools Appl., vol. 83, no. 2, pp. 4791–4808, 2024, doi: 10.1007/s11042-023-15887-9.
[3] S. Wu and H. Nagahashi, “Parameterized AdaBoost: Introducing a Parameter to Speed Up the Training of Real AdaBoost,” IEEE Signal Process. Lett., vol. 21, no. 6, pp. 687–691, 2014, doi: 10.1109/LSP.2014.2313570.
[4] M. Chen, H. Xu, Y. Wu, and J. Wu, “Sentiment Analysis of Hotel Reviews based on BERT and XGBoost,” in 2024 3rd International Conference on Computer Technologies (ICCTech), 2024, pp. 11–15. doi: 10.1109/ICCTech61708.2024.00011.
[5] A. Ghosh, “Sentiment Analysis of IMDb Movie Reviews : A comparative study on Performance of Hyperparameter-tuned Classification Algorithms,” in 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS), 2022, vol. 1, pp. 289–294. doi: 10.1109/ICACCS54159.2022.9784961.
[6] S. Tripathi, R. Mehrotra, V. Bansal, and S. Upadhyay, “Analyzing Sentiment using IMDb Dataset,” in 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN), 2020, pp. 30–33. doi: 10.1109/CICN49253.2020.9242570.
[7] C. Zhu, J. Yao, G. Zhao, S. Wang, S. Liu, and Z. Liu, “Negative review detection model based on LightGBM,” in 2022 4th International Conference on Intelligent Information Processing (IIP), 2022, pp. 171–174. doi: 10.1109/IIP57348.2022.00042.
[8] Á. Kovács and T. Tajti, “Enhancing Sentiment Analysis Accuracy on IMDB Reviews Through Ensemble Machine Learning Techniques,” in 2023 IEEE 21st Jubilee International Symposium on Intelligent Systems and Informatics (SISY), 2023, pp. 289–294. doi: 10.1109/SISY60376.2023.10417873.
[9] S. N. Başa and M. S. Basarslan, “Sentiment Analysis Using Machine Learning Techniques on IMDB Dataset,” in 2023 7th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), 2023, pp. 1–5. doi: 10.1109/ISMSIT58785.2023.10304923.
[10] H. B. Habib, M. K. Chowdhury, M. T. Islam, M. S. Mahmud, and A. Sattar, “Sentiment Classification for IMDB Movie Reviews in Benchmark Dataset Using LR, MNB and SGD,” in 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2023, pp. 1–6. doi: 10.1109/ICCCNT56998.2023.10307321.
[11] S. Li, G. Yin, and T. Yang, “Research on product iterative requirement analysis method based on internet review data and XGBoost,” in 2020 IEEE International Conference on Information Technology,Big Data and Artificial Intelligence (ICIBA), 2020, vol. 1, pp. 179–184. doi: 10.1109/ICIBA50161.2020.9277005.
[12] X. Feng, “Research of Sentiment Analysis Based on Adaboost Algorithm,” in 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), 2019, pp. 279–282. doi: 10.1109/MLBDBI48998.2019.00062.
[13] M. M. Ahsan, M. A. P. Mahmud, P. K. Saha, K. D. Gupta, and Z. Siddique, “Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance,” Technologies, vol. 9, no. 3, 2021, doi: 10.3390/technologies9030052.
[14] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learning Word Vectors for Sentiment Analysis,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Jun. 2011, pp. 142–150. [Online]. Available: http://www.aclweb.org/anthology/P11-1015
[15] M. Sergii V. and N. Oleksandr V., “Data preprocessing and tokenization techniquesfortechnical Ukrainian texts,” Appl. Asp. Inf. Technol., vol. 6, no. 3, pp. 318–326, 2023, doi: 10.15276/aait.06.2023.22.
[16] C. P. Chai, “Comparison of text preprocessing methods,” Nat. Lang. Eng., vol. 29, no. 3, pp. 509–553, 2023, doi: 10.1017/S1351324922000213.
[17] F. Mekhalfa and N. Nacereddine, “Gentle Adaboost algorithm for weld defect classification,” in 2017 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), 2017, pp. 301–306. doi: 10.23919/SPA.2017.8166883.
[18] L. Sun, “Application and Improvement of Xgboost Algorithm Based on Multiple Parameter Optimization Strategy,” in 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), 2020, pp. 1822–1825. doi: 10.1109/ICMCCE51767.2020.00400.
[19] T. Kurniawan, L. Hermawanti, and A. N. Safriandono, “Interpretable Machine Learning with SHAP and XGBoost for Lung Cancer Prediction Insights,” vol. 8, no. 2, pp. 296–303, 2024, doi: https://doi.org/10.30871/jaic.v8i2.8395.
[20] I Gusti Ayu Nandia Lestari and I Komang Agus Ady Aryanto, “Peningkatan Akurasi Klasifikasi Kualitas Udara melalui Oversampling dengan Metode Support Vector Machine dan Random Forest,” J. Sist. dan Inform., vol. 18, no. 1, pp. 1–9, 2023, doi: 10.30864/jsi.v18i1.596.
[21] I. G. Ayu Nandia Lestari, D. G. Hendra Divayana, and K. Y. Ernada Aryanto, “A Concentration Selection In Study Programs Using SMOTE Techniques With Ensemble Learning Algorithms,” in 2023 5th International Conference on Cybernetics and Intelligent System (ICORIS), 2023, pp. 1–6. doi: 10.1109/ICORIS60118.2023.10352192.
[22] T. S. Nabila et al., “Classification of Brain Tumors by Using a Hybrid CNN-SVM Model,” vol. 8, no. 2, pp. 241–247, 2024, doi: https://doi.org/10.30871/jaic.v8i2.8277.
[23] A. Anggrawan, H. Hairani, and C. Satria, “Improving SVM Classification Performance on Unbalanced Student Graduation Time Data Using SMOTE,” Int. J. Inf. Educ. Technol., vol. 13, no. 2, pp. 289–295, 2023, doi: 10.18178/ijiet.2023.13.2.1806.
[24] B. Şener, K. Acici, and E. Sümer, “Categorization of Alzheimer’s disease stages using deep learning approaches with McNemar’s test,” PeerJ Comput. Sci., vol. 10, p. e1877, Feb. 2024, doi: 10.7717/peerj-cs.1877.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 I Gusti Ayu Nandia Lestari, Ni Made Rai Masita Dewi, Komang Gita Meiliana, I Komang Agus Ady Aryanto

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).