Effectiveness of AdaBoost and XGBoost Algorithms  in Sentiment Analysis of Movie Reviews

I Gusti Ayu Nandia Lestari; Ni Made Rai Masita Dewi; Komang Gita Meiliana; I Komang Agus Ady Aryanto

doi:10.30871/jaic.v9i2.9077

Authors

I Gusti Ayu Nandia Lestari Department of Information Technology, Institut Teknologi dan Bisnis STIKOM Bali
Ni Made Rai Masita Dewi Department of Information Systems, Institut Teknologi dan Bisnis STIKOM Bali
Komang Gita Meiliana Department of Information Systems, Institut Teknologi dan Bisnis STIKOM Bali
I Komang Agus Ady Aryanto Department of Information Technology, Institut Teknologi dan Bisnis STIKOM Bali

DOI:

https://doi.org/10.30871/jaic.v9i2.9077

Keywords:

AdaBoost, Classification, IMDb reviews dataset, Sentiment analysis, XGBoost

Abstract

Currently there are many entertainment platforms that provide various movies, TV shows, games, and other content. These platforms usually offer a variety of features, one of which is reviews. Review data written by viewers plays an important role in influencing public interest in the film. However, the increasing number of reviews makes it difficult to assess the sentiment of the film quickly and accurately. This highlights the need for a system that can analyze reviews based on sentiment, making it easier for viewers to evaluate the film and supporting the entertainment industry in understanding the needs of the audience. Therefore, this study develops a sentiment analysis model to identify whether a review contains positive or negative sentiment using machine learning algorithms. The data used to build the model is obtained from user reviews of a film on the IMDb platform. This dataset is available on Kaggle with 50,000 movie reviews in text format. The characteristics of the data include two columns: review_text and sentiment. The methods used to create the classification model are AdaBoost and XGBoost. The data preprocessing process includes several stages such as text cleaning, tokenization, stopword removal, lemmatization, and vectorization using TF-IDF to convert the review text into numeric form, as well as converting the positive and negative labels into 1 and 0. Based on the results of model training with cross-validation, the accuracy of the XGBoost model is 85% and AdaBoost is 77%. Feature selection showed an improvement in the XGBoost model's accuracy from 85% to 86%, while the AdaBoost model's performance remained stable at 77%. Thus, it can be concluded that the XGBoost model demonstrates better performance than the AdaBoost model in sentiment classification.

Downloads

Download data is not yet available.

References

[1] K. Lu and J. Wu, “Sentiment analysis of film review texts based on sentiment dictionary and SVM,” ACM Int. Conf. Proceeding Ser., vol. Part F1481, pp. 73–77, 2019, doi: 10.1145/3319921.3319966.

[2] K. K. Singh, J. Makhania, and M. Mahapatra, “Impact of ratings of content on OTT platforms and prediction of its success rate,” Multimed. Tools Appl., vol. 83, no. 2, pp. 4791–4808, 2024, doi: 10.1007/s11042-023-15887-9.

[3] S. Wu and H. Nagahashi, “Parameterized AdaBoost: Introducing a Parameter to Speed Up the Training of Real AdaBoost,” IEEE Signal Process. Lett., vol. 21, no. 6, pp. 687–691, 2014, doi: 10.1109/LSP.2014.2313570.

[4] M. Chen, H. Xu, Y. Wu, and J. Wu, “Sentiment Analysis of Hotel Reviews based on BERT and XGBoost,” in 2024 3rd International Conference on Computer Technologies (ICCTech), 2024, pp. 11–15. doi: 10.1109/ICCTech61708.2024.00011.

[5] A. Ghosh, “Sentiment Analysis of IMDb Movie Reviews : A comparative study on Performance of Hyperparameter-tuned Classification Algorithms,” in 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS), 2022, vol. 1, pp. 289–294. doi: 10.1109/ICACCS54159.2022.9784961.

[6] S. Tripathi, R. Mehrotra, V. Bansal, and S. Upadhyay, “Analyzing Sentiment using IMDb Dataset,” in 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN), 2020, pp. 30–33. doi: 10.1109/CICN49253.2020.9242570.

[7] C. Zhu, J. Yao, G. Zhao, S. Wang, S. Liu, and Z. Liu, “Negative review detection model based on LightGBM,” in 2022 4th International Conference on Intelligent Information Processing (IIP), 2022, pp. 171–174. doi: 10.1109/IIP57348.2022.00042.

[8] Á. Kovács and T. Tajti, “Enhancing Sentiment Analysis Accuracy on IMDB Reviews Through Ensemble Machine Learning Techniques,” in 2023 IEEE 21st Jubilee International Symposium on Intelligent Systems and Informatics (SISY), 2023, pp. 289–294. doi: 10.1109/SISY60376.2023.10417873.

[9] S. N. Başa and M. S. Basarslan, “Sentiment Analysis Using Machine Learning Techniques on IMDB Dataset,” in 2023 7th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), 2023, pp. 1–5. doi: 10.1109/ISMSIT58785.2023.10304923.

[10] H. B. Habib, M. K. Chowdhury, M. T. Islam, M. S. Mahmud, and A. Sattar, “Sentiment Classification for IMDB Movie Reviews in Benchmark Dataset Using LR, MNB and SGD,” in 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2023, pp. 1–6. doi: 10.1109/ICCCNT56998.2023.10307321.

[11] S. Li, G. Yin, and T. Yang, “Research on product iterative requirement analysis method based on internet review data and XGBoost,” in 2020 IEEE International Conference on Information Technology,Big Data and Artificial Intelligence (ICIBA), 2020, vol. 1, pp. 179–184. doi: 10.1109/ICIBA50161.2020.9277005.

[12] X. Feng, “Research of Sentiment Analysis Based on Adaboost Algorithm,” in 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), 2019, pp. 279–282. doi: 10.1109/MLBDBI48998.2019.00062.

[13] M. M. Ahsan, M. A. P. Mahmud, P. K. Saha, K. D. Gupta, and Z. Siddique, “Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance,” Technologies, vol. 9, no. 3, 2021, doi: 10.3390/technologies9030052.

[14] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learning Word Vectors for Sentiment Analysis,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Jun. 2011, pp. 142–150. [Online]. Available: http://www.aclweb.org/anthology/P11-1015

[15] M. Sergii V. and N. Oleksandr V., “Data preprocessing and tokenization techniquesfortechnical Ukrainian texts,” Appl. Asp. Inf. Technol., vol. 6, no. 3, pp. 318–326, 2023, doi: 10.15276/aait.06.2023.22.

[16] C. P. Chai, “Comparison of text preprocessing methods,” Nat. Lang. Eng., vol. 29, no. 3, pp. 509–553, 2023, doi: 10.1017/S1351324922000213.

[17] F. Mekhalfa and N. Nacereddine, “Gentle Adaboost algorithm for weld defect classification,” in 2017 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), 2017, pp. 301–306. doi: 10.23919/SPA.2017.8166883.

[18] L. Sun, “Application and Improvement of Xgboost Algorithm Based on Multiple Parameter Optimization Strategy,” in 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), 2020, pp. 1822–1825. doi: 10.1109/ICMCCE51767.2020.00400.

[19] T. Kurniawan, L. Hermawanti, and A. N. Safriandono, “Interpretable Machine Learning with SHAP and XGBoost for Lung Cancer Prediction Insights,” vol. 8, no. 2, pp. 296–303, 2024, doi: https://doi.org/10.30871/jaic.v8i2.8395.

[20] I Gusti Ayu Nandia Lestari and I Komang Agus Ady Aryanto, “Peningkatan Akurasi Klasifikasi Kualitas Udara melalui Oversampling dengan Metode Support Vector Machine dan Random Forest,” J. Sist. dan Inform., vol. 18, no. 1, pp. 1–9, 2023, doi: 10.30864/jsi.v18i1.596.

[21] I. G. Ayu Nandia Lestari, D. G. Hendra Divayana, and K. Y. Ernada Aryanto, “A Concentration Selection In Study Programs Using SMOTE Techniques With Ensemble Learning Algorithms,” in 2023 5th International Conference on Cybernetics and Intelligent System (ICORIS), 2023, pp. 1–6. doi: 10.1109/ICORIS60118.2023.10352192.

[22] T. S. Nabila et al., “Classification of Brain Tumors by Using a Hybrid CNN-SVM Model,” vol. 8, no. 2, pp. 241–247, 2024, doi: https://doi.org/10.30871/jaic.v8i2.8277.

[23] A. Anggrawan, H. Hairani, and C. Satria, “Improving SVM Classification Performance on Unbalanced Student Graduation Time Data Using SMOTE,” Int. J. Inf. Educ. Technol., vol. 13, no. 2, pp. 289–295, 2023, doi: 10.18178/ijiet.2023.13.2.1806.

[24] B. Şener, K. Acici, and E. Sümer, “Categorization of Alzheimer’s disease stages using deep learning approaches with McNemar’s test,” PeerJ Comput. Sci., vol. 10, p. e1877, Feb. 2024, doi: 10.7717/peerj-cs.1877.

Effectiveness of AdaBoost and XGBoost Algorithms in Sentiment Analysis of Movie Reviews

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

submit

tools

issn