Opinion Classification on IMDb Reviews Using Naïve Bayes Algorithm
DOI:
https://doi.org/10.30871/jaic.v9i6.9831Keywords:
IMDb, Algoritma Naive Bayes, opini classification, sentiment analysis, Natural Language Processing (NLP)Abstract
This study aims to classify user opinions on IMDb movie reviews using the Multinomial Naïve Bayes algorithm. The dataset consists of 50,000 reviews, evenly distributed between 25,000 positive and 25,000 negative reviews. The preprocessing stage includes cleaning, case folding, stopword removal, tokenization, and lemmatization using the NLTK library. Text features are represented through the TF-IDF method to capture the significance of each word in the documents. The Multinomial Naïve Bayes model was trained using the hold-out validation technique with an 80:20 split for training and testing data. Hyperparameter tuning of α (Laplace smoothing) was conducted to enhance model stability and accuracy. The model’s performance was evaluated using accuracy, precision, recall, and F1-score metrics, supported by a confusion matrix visualization. The results show that the model achieved an accuracy of 87%, with precision of 87.9%, recall of 85.4%, and an F1-score of 86.6%. In comparison, Logistic Regression as a baseline algorithm achieved an accuracy of 91%. Nevertheless, the Naïve Bayes algorithm remains competitive and computationally efficient for large-scale text data, making it highly relevant for sentiment analysis of movie reviews.
Downloads
References
[1] R. M. Awangga and N. H. Khonsa’, “Analisis Performa Algoritma Random Forest dan Naive Bayes Multinomial pada Dataset Ulasan Obat dan Ulasan Film,” InComTech J. Telekomun. dan Komput., vol. 12, no. 1, p. 60, Apr. 2022, doi: 10.22441/incomtech.v12i1.14770.
[2] I. A. . Dityawan, “Pengaruh Rating dalam Situs IMDb terhadap Keputusan Menonton di Kota Bandung. (Studi Pada film Halfworlds),” Universitas Telkom, 2016. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/pustaka/121802/pengaruh-rating-dalam-situs-imdb-terhadap-keputusan-menonton-di-kota-bandung-studi-pada-film-halfworlds-.html#:~:text=Internet Movie Database (IMDb) adalah,sampai penata rias dan soundtrack.
[3] F. Ratnawati, “Implementasi Algoritma Naive Bayes Terhadap Analisis Sentimen Opini Film Pada Twitter,” J. INOVTEK POLBENG - SERI Inform., vol. 3, no. 1, pp. 50–59, 2018, [Online]. Available: https://ejournal.polbeng.ac.id/index.php/ISI/article/view/335
[4] “Apa itu Pemrosesan Bahasa Alami (NLP)?,” Amazon Web Server. Accessed: Apr. 17, 2025. [Online]. Available: https://aws.amazon.com/id/what-is/nlp/
[5] M. Apriliyani, M. I. Musyaffaq, S. Nur’Aini, and K. Umam, “Implementasi analisis sentimen pada ulasan aplikasi Duolingo di Google Playstore menggunakan algoritma Naïve Bayes,” AITI J. Teknol. Inf., vol. 21, no. 2, pp. 302–303, 2024, [Online]. Available: https://ejournal.uksw.edu/aiti/article/view/12100
[6] G. R. Widyaningtias, M. Adam, and E. Daniati, “Klasifikasi Genre Film Terpopuler Bulanan Menggunakan Algoritma Naive Bayes Berbasis Data Penayangan,” SEMNAS INOTEK, vol. 9, pp. 2183–2188, 2025, [Online]. Available: https://proceeding.unpkediri.ac.id/index.php/inotek/
[7] Y. Nurtikasari, Syariful Alam, and Teguh Iman Hermanto, “Analisis Sentimen Opini Masyarakat Terhadap Film Pada Platform Twitter Menggunakan Algoritma Naive Bayes,” INSOLOGI J. Sains dan Teknol., vol. 1, no. 4, pp. 411–423, Aug. 2022, doi: 10.55123/insologi.v1i4.770.
[8] R. W. Pratiwi and Y. S. Nugroho, “Prediksi Rating Film Menggunakan Metode Naïve Bayes,” Duta.com, vol. 12, no. 1, pp. 91–108, 2017, [Online]. Available: file:///C:/Users/ADVAN/Downloads/reverensi artikel nlp/admin,+211-212-1-SM.pdf
[9] C. Rizal, D. A. Kifta, R. H. Nasution, A. Rengganis, and R. Watrianthos, “Opinion classification for IMDb review based using naive bayes method,” 2023, p. 030025. doi: 10.1063/5.0171628.
[10] K. Pradeep, C. R. TintuRosmin, S. S. Durom, and G. S. Anisha, “Decision Tree Algorithms for Accurate Prediction of Movie Rating,” in 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), IEEE, Mar. 2020, pp. 853–858. doi: 10.1109/ICCMC48092.2020.ICCMC-000158.
[11] C. Prianto, N. H. Harani, and I. Firmansyah, “Analisis Sentimen Terhadap Kandidat Presiden Republik Indonesia Pada Pemilu 2019 di Media Sosial Twitter,” J. MEDIA Inform. BUDIDARMA, vol. 3, no. 4, p. 405, Oct. 2019, doi: 10.30865/mib.v3i4.1549.
[12] Arif Widiasan Subagio, Anggraini Puspita Sari, and Andreas Nugroho Sihananto, “Klasifikasi Lexicon-Based Sentiment Analysis Tragedi Kanjuruhan pada Twitter Menggunakan Algoritma Convolutional Neural Network,” J. Ilm. Sist. Inf. dan Ilmu Komput., vol. 4, no. 1, pp. 166–177, Jan. 2024, doi: 10.55606/juisik.v4i1.759.
[13] D. Darwis, E. S. Pratiwi, and A. F. O. Pasaribu, “Penerapan Algoritma SVM untuk Analisis Sentiment pada Data Twitter Komisi Pemberantasan Korupsi Republik Indonesia,” J. Ilm. Edutic, vol. 7, no. 1, pp. 1–11, 2020, [Online]. Available: https://journal.trunojoyo.ac.id/edutic/article/viewFile/8779/5125
[14] I. M. Yulietha, S. Al Faraby, and Adiwijaya, “Klasifikasi Sentiment Review Film Menggunakan Algoritma Support Vector Machine,” e-Proceeding Eng., vol. 4, no. 3, pp. 4748–4749, 2017, [Online]. Available: https://core.ac.uk/download/pdf/299917715.pdf
[15] S. Srinidhi, “Lemmatisasi dalam Pemrosesan Bahasa Alami (NLP) dan Pembelajaran Mesin,” bulitin. Accessed: Apr. 19, 2025. [Online]. Available: https://builtin.com/machine-learning/lemmatization
[16] B. V. Haekal, L. Ernawati, and N. Chamida, “Klasifikasi Kepuasan Pengguna Layanan Aplikasi Shopee Menggunakan Metode Decision Tree C4.5,” IFTK, vol. 17, no. 3, p. 193, 2021, [Online]. Available: file:///C:/Users/ADVAN/Downloads/theresiawati,+188-196.pdf
[17] G. Sanjaya and K. M. Lhaksmana, “No TitleaAnalisis Sentimen Komentar YouTube tentang Terpilihnya Menteri Kabinet Indonesia Maju Menggunakan Lexicon Based,” e-Proceeding Eng., vol. 7, no. 3, pp. 9698–9710, 2020, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/14205
[18] A. Tangkelayuk, “The Klasifikasi Kualitas Air Menggunakan Metode KNN, Naïve Bayes, dan Decision Tree,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 9, no. 2, pp. 1109–1119, Jun. 2022, doi: 10.35957/jatisi.v9i2.2048.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Amiliya Putri, Khothibul Umam, Hery Mustofa

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








