Application of Multinomial Naïve Bayes for Sentiment Classification on Bukalapak Reviews
DOI:
https://doi.org/10.30871/jaic.v9i6.11671Keywords:
Sentiment Analysis, Multinomial Naïve Bayes, E-Commerce, Bukalapak, Customer ReviewsAbstract
This study investigates sentiment analysis on user reviews from Bukalapak, a major Indonesian e-commerce platform, using the Multinomial Naïve Bayes (MNB) classifier. The study focuses on tackling the challenge of data imbalance and the linguistic complexities of Indonesian, such as slang, affixes, and negation, which are common in user reviews. Data was collected through web scraping from Bukalapak's app on the Google Play Store, resulting in a dataset of 19,999 reviews. A structured preprocessing pipeline was employed, including text normalization, tokenization, stopword removal, stemming, and term frequency-inverse document frequency (TF-IDF) weighting to prepare the data. The sentiment analysis results show that the model performs well in categorizing neutral reviews (accuracy 81%), but struggles with positive and negative sentiments due to data imbalance, leading to lower accuracy for these categories. The study highlights the effectiveness of Multinomial Naïve Bayes in large-scale sentiment analysis tasks in the e-commerce domain, particularly for platforms with large volumes of user-generated content. The study also introduces SMOTE (Synthetic Minority Over-sampling Technique) for handling data imbalance and k-fold cross-validation for model evaluation, significantly improving the model’s reliability. The research concludes that sentiment analysis can greatly benefit e-commerce platforms by improving customer service, informing product management decisions, and providing valuable insights for business strategies.
Downloads
References
[1] N. Adiasa, "Pengaruh Pemahaman Peraturan Pajak terhadap Kepatuhan Wajib Pajak dengan Moderating Preferensi Risiko," Accounting Analysis Journal, vol. 2, no. 3, pp. 345–352, 2013.
[2] S. Andrews and L. Hirsch, "A Tool for Creating and Visualising Formal Concept Trees," CEUR Workshop Proceedings, vol. 1637, pp. 1–9, 2016.
[3] A. Agustinah, "Word Cloud of Corruption Eradication Commission," pp. 4-5, 2015.
[4] A. Aswin and A. Wahidun, "Analisis Atribut Produk Samsung dan Asus Menggunakan Metode Multidimensional Scaling (MDS) di Bandar Lampung," Jurnal Bisnis Darmajaya, vol. 2, no. 2, pp. 62–74, 2016.
[5] R. Feldman and J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2007.
[6] F. Fitria and I. Dwijananda, "Analisis Pengaruh Electronic Word of Mouth terhadap Proses Keputusan Pembelian (Studi pada Go-Jek)," Eproceding of Management, vol. 3, pp. 1–19, 2016.
[7] M. Rasyadi, "Analisis Sentimen pada Twitter Menggunakan Metode Naïve Bayes (Studi Kasus Pemilihan Gubernur DKI Jakarta 2017)," pp. 1–17, 2017.
[8] C. Sagita, "Pengaruh Electronic Word of Mouth, Brand Ambassador, dan Persepsi Nilai terhadap Keputusan Pembelian pada Tokopedia.com," IIB Darmajaya, 2020. [Online]. Available: http://repo.darmajaya.ac.id/id/eprint/2673
[9] F. Gorunescu, Data Mining: Concepts, Models and Techniques, Springer Science & Business Media, 2011.
[10] C. Kaur and A. Sharma, "Twitter Sentiment Analysis on Coronavirus Using TextBlob," EasyChair Preprint 2974, pp. 1–10, 2020.
[11] S. Kim, K. Han, H. Rim, and S. Myaeng, "Some Effective Techniques for Naive Bayes Text Classification," IEEE Transactions on Knowledge and Data Engineering, vol. 24, pp. 1457–1466, 2006.
[12] N. Komang et al., "Seleksi Fitur Bobot Kata dengan Metode TF-IDF untuk Ringkasan Bahasa Indonesia," Merpati, vol. 6, no. 2, 2018.
[13] A. Kurniawan, "Analisis Kondisi Lingkungan Fisik Rumah dengan Kejadian ISPA pada Balita di Wilayah Puskesmas Purwokerto Selatan Kecamatan Purwokerto Selatan Kabupaten Banyumas Tahun 2013," Universitas Harapan Bangsa, 2013. Available: http://eprints.uhb.ac.id/id/eprint/2103
[14] C. K. Laudon and P. J. Laudon, Essentials of Management Information Systems, Pearson Education, Inc., 2013.
[15] M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning, 2012.
[16] A. Mustika and M. Affandes, "Penerapan Metode Support Vector Machine dalam Klasifikasi Sentimen Tweet Public Figure," Sentra, pp. 978–979, 2015.
[17] D. Normawati and S. A. Prayogi, "Implementasi Naïve Bayes Classifier dan Confusion Matrix pada Analisis Sentimen Berbasis Teks pada Twitter," Jurnal Sains Komputer & Informatika (J-Sakti), vol. 5, no. 2, pp. 697–711, 2021.
[18] F. Nurhuda, S. W. Sihwi, and A. Doewes, "Analisis Sentimen Masyarakat terhadap Calon Presiden Indonesia 2014 Berdasarkan Opini dari Twitter Menggunakan Metode Naïve Bayes Classifier," ITSmart: Jurnal Teknologi dan Informasi, vol. 2, no. 2, pp. 35–42, 2013.
[19] L. Perkovic, Introduction to Computing Using Python, pp. 510, 2012. Available: https://dspace.uii.ac.id/bitstream/handle/123456789/7762/14611242_syarifah_rosita_dewi_statistika.pdf?Sequence=1
[20] P. Kotler and K. L. Keller, A Framework for Marketing Management (Sixth Edition-Global Edition), 2016.
[21] J. Pustejovsky and A. Stubbs, Natural Language Annotation for Machine Learning: A Guide to Corpus-Building for Applications, 2012. Available: https://books.google.co.id/books?id=nmx4vxv1k0yc
[22] M. Rasyadi, "Analisis Sentimen pada Twitter Menggunakan Metode Naïve Bayes (Studi Kasus Pemilihan Gubernur DKI Jakarta 2017)," pp. 1–17, 2017.
[23] C. Sagita, "Pengaruh Electronic Word of Mouth, Brand Ambassador, dan Persepsi Nilai terhadap Keputusan Pembelian pada Tokopedia.com," IIB Darmajaya, 2020. Available: http://repo.darmajaya.ac.id/id/eprint/2673
[24] U. Sumarwan, U. Simanjuntak, and L. N. Yuliati, "Meta-Analysis Study: Reading Behavior of Food Products Label," Journal of Consumer Sciences, vol. 2, no. 2, pp. 26, 2017.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Dona Yuliawati, Musyafa Faeang Ogya Widi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








