Opinion Classification on IMDb Reviews Using Naïve Bayes Algorithm

Authors

  • Amiliya Putri UIN Walisongo Semarang
  • Khothibul Umam UIN Walisongo Semarang
  • Hery Mustofa UIN Walisongo Semarang

DOI:

https://doi.org/10.30871/jaic.v9i6.9831

Keywords:

IMDb, Algoritma Naive Bayes, opini classification, sentiment analysis, Natural Language Processing (NLP)

Abstract

This study aims to classify user opinions on IMDb movie reviews using the Multinomial Naïve Bayes algorithm. The dataset consists of 50,000 reviews, evenly distributed between 25,000 positive and 25,000 negative reviews. The preprocessing stage includes cleaning, case folding, stopword removal, tokenization, and lemmatization using the NLTK library. Text features are represented through the TF-IDF method to capture the significance of each word in the documents. The Multinomial Naïve Bayes model was trained using the hold-out validation technique with an 80:20 split for training and testing data. Hyperparameter tuning of α (Laplace smoothing) was conducted to enhance model stability and accuracy. The model’s performance was evaluated using accuracy, precision, recall, and F1-score metrics, supported by a confusion matrix visualization. The results show that the model achieved an accuracy of 87%, with precision of 87.9%, recall of 85.4%, and an F1-score of 86.6%. In comparison, Logistic Regression as a baseline algorithm achieved an accuracy of 91%. Nevertheless, the Naïve Bayes algorithm remains competitive and computationally efficient for large-scale text data, making it highly relevant for sentiment analysis of movie reviews.

Downloads

Download data is not yet available.

References

[1] R. M. Awangga and N. H. Khonsa’, “Analisis Performa Algoritma Random Forest dan Naive Bayes Multinomial pada Dataset Ulasan Obat dan Ulasan Film,” InComTech J. Telekomun. dan Komput., vol. 12, no. 1, p. 60, Apr. 2022, doi: 10.22441/incomtech.v12i1.14770.

[2] I. A. . Dityawan, “Pengaruh Rating dalam Situs IMDb terhadap Keputusan Menonton di Kota Bandung. (Studi Pada film Halfworlds),” Universitas Telkom, 2016. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/pustaka/121802/pengaruh-rating-dalam-situs-imdb-terhadap-keputusan-menonton-di-kota-bandung-studi-pada-film-halfworlds-.html#:~:text=Internet Movie Database (IMDb) adalah,sampai penata rias dan soundtrack.

[3] F. Ratnawati, “Implementasi Algoritma Naive Bayes Terhadap Analisis Sentimen Opini Film Pada Twitter,” J. INOVTEK POLBENG - SERI Inform., vol. 3, no. 1, pp. 50–59, 2018, [Online]. Available: https://ejournal.polbeng.ac.id/index.php/ISI/article/view/335

[4] “Apa itu Pemrosesan Bahasa Alami (NLP)?,” Amazon Web Server. Accessed: Apr. 17, 2025. [Online]. Available: https://aws.amazon.com/id/what-is/nlp/

[5] M. Apriliyani, M. I. Musyaffaq, S. Nur’Aini, and K. Umam, “Implementasi analisis sentimen pada ulasan aplikasi Duolingo di Google Playstore menggunakan algoritma Naïve Bayes,” AITI J. Teknol. Inf., vol. 21, no. 2, pp. 302–303, 2024, [Online]. Available: https://ejournal.uksw.edu/aiti/article/view/12100

[6] G. R. Widyaningtias, M. Adam, and E. Daniati, “Klasifikasi Genre Film Terpopuler Bulanan Menggunakan Algoritma Naive Bayes Berbasis Data Penayangan,” SEMNAS INOTEK, vol. 9, pp. 2183–2188, 2025, [Online]. Available: https://proceeding.unpkediri.ac.id/index.php/inotek/

[7] Y. Nurtikasari, Syariful Alam, and Teguh Iman Hermanto, “Analisis Sentimen Opini Masyarakat Terhadap Film Pada Platform Twitter Menggunakan Algoritma Naive Bayes,” INSOLOGI J. Sains dan Teknol., vol. 1, no. 4, pp. 411–423, Aug. 2022, doi: 10.55123/insologi.v1i4.770.

[8] R. W. Pratiwi and Y. S. Nugroho, “Prediksi Rating Film Menggunakan Metode Naïve Bayes,” Duta.com, vol. 12, no. 1, pp. 91–108, 2017, [Online]. Available: file:///C:/Users/ADVAN/Downloads/reverensi artikel nlp/admin,+211-212-1-SM.pdf

[9] C. Rizal, D. A. Kifta, R. H. Nasution, A. Rengganis, and R. Watrianthos, “Opinion classification for IMDb review based using naive bayes method,” 2023, p. 030025. doi: 10.1063/5.0171628.

[10] K. Pradeep, C. R. TintuRosmin, S. S. Durom, and G. S. Anisha, “Decision Tree Algorithms for Accurate Prediction of Movie Rating,” in 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), IEEE, Mar. 2020, pp. 853–858. doi: 10.1109/ICCMC48092.2020.ICCMC-000158.

[11] C. Prianto, N. H. Harani, and I. Firmansyah, “Analisis Sentimen Terhadap Kandidat Presiden Republik Indonesia Pada Pemilu 2019 di Media Sosial Twitter,” J. MEDIA Inform. BUDIDARMA, vol. 3, no. 4, p. 405, Oct. 2019, doi: 10.30865/mib.v3i4.1549.

[12] Arif Widiasan Subagio, Anggraini Puspita Sari, and Andreas Nugroho Sihananto, “Klasifikasi Lexicon-Based Sentiment Analysis Tragedi Kanjuruhan pada Twitter Menggunakan Algoritma Convolutional Neural Network,” J. Ilm. Sist. Inf. dan Ilmu Komput., vol. 4, no. 1, pp. 166–177, Jan. 2024, doi: 10.55606/juisik.v4i1.759.

[13] D. Darwis, E. S. Pratiwi, and A. F. O. Pasaribu, “Penerapan Algoritma SVM untuk Analisis Sentiment pada Data Twitter Komisi Pemberantasan Korupsi Republik Indonesia,” J. Ilm. Edutic, vol. 7, no. 1, pp. 1–11, 2020, [Online]. Available: https://journal.trunojoyo.ac.id/edutic/article/viewFile/8779/5125

[14] I. M. Yulietha, S. Al Faraby, and Adiwijaya, “Klasifikasi Sentiment Review Film Menggunakan Algoritma Support Vector Machine,” e-Proceeding Eng., vol. 4, no. 3, pp. 4748–4749, 2017, [Online]. Available: https://core.ac.uk/download/pdf/299917715.pdf

[15] S. Srinidhi, “Lemmatisasi dalam Pemrosesan Bahasa Alami (NLP) dan Pembelajaran Mesin,” bulitin. Accessed: Apr. 19, 2025. [Online]. Available: https://builtin.com/machine-learning/lemmatization

[16] B. V. Haekal, L. Ernawati, and N. Chamida, “Klasifikasi Kepuasan Pengguna Layanan Aplikasi Shopee Menggunakan Metode Decision Tree C4.5,” IFTK, vol. 17, no. 3, p. 193, 2021, [Online]. Available: file:///C:/Users/ADVAN/Downloads/theresiawati,+188-196.pdf

[17] G. Sanjaya and K. M. Lhaksmana, “No TitleaAnalisis Sentimen Komentar YouTube tentang Terpilihnya Menteri Kabinet Indonesia Maju Menggunakan Lexicon Based,” e-Proceeding Eng., vol. 7, no. 3, pp. 9698–9710, 2020, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/14205

[18] A. Tangkelayuk, “The Klasifikasi Kualitas Air Menggunakan Metode KNN, Naïve Bayes, dan Decision Tree,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 9, no. 2, pp. 1109–1119, Jun. 2022, doi: 10.35957/jatisi.v9i2.2048.

Downloads

Published

2025-12-06

How to Cite

[1]
A. Putri, K. Umam, and H. Mustofa, “Opinion Classification on IMDb Reviews Using Naïve Bayes Algorithm”, JAIC, vol. 9, no. 6, pp. 3168–3176, Dec. 2025.

Most read articles by the same author(s)

Similar Articles

<< < 1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.