Optimizing Feature Extraction for Naïve Bayes Sentiment Analysis

Authors

  • Achmad Achmad Universitas Dian Nuswantoro
  • Fikri Budiman Universitas Dian Nuswantoro

DOI:

https://doi.org/10.30871/jaic.v10i1.12041

Keywords:

Naïve Bayes, Sentiment Analysis, Feature Extraction, Optimization

Abstract

The rapid growth of e-commerce platforms such as Tokopedia has generated a large volume of user reviews containing diverse opinions about products and services. These reviews reflect consumer perceptions and provide valuable insights for business decision-making. This study aims to enhance sentiment analysis performance by optimizing the Naïve Bayes algorithm through a comparison of two feature extraction techniques, namely Bag of Words (BoW) and Term Frequency–Inverse Document Frequency (TF-IDF). The dataset consists of 5,400 Tokopedia product reviews obtained from the Kaggle platform, which are categorized into positive and negative sentiments. The research process includes text preprocessing consisting of text cleaning, case folding, tokenization, stopword removal, and stemming, feature extraction using Bag of Words (BoW) and Term Frequency–Inverse Document Frequency (TF-IDF), handling data imbalance using the Synthetic Minority Over-sampling Technique (SMOTE), and model training using the Naïve Bayes. The dataset is divided into 80% training data and 20% testing data, and model performance is evaluated using accuracy, precision, recall, and F1-score. The results show that BoW achieved the highest accuracy of 93%, while TF-IDF reached 83%, indicating that BoW provides more effective feature representation and more stable performance for Naïve Bayes-based sentiment analysis on this dataset.

Downloads

Download data is not yet available.

References

[1] Y. Putri, R. Kusumadewi, and E. Saefulloh, “Pengaruh Kredibilitas Influencer dan Brand Awarness Terhadap Minat Pembelian di Tokopedia (studi Pada Pelanggan Tokopedia Yang Bertransaksi Melalui Bank Syariah Indonesia),” Entrep. J. Bisnis Manaj. Dan Kewirausahaan, vol. 4, pp. 205–225, May 2023, doi: 10.31949/entrepreneur.v4i2.5651.

[2] M. Idris, A. Rifai, and K. D. Tania, “Sentiment Analysis of Tokopedia App Reviews using Machine Learning and Word Embeddings,” Sink. J. Dan Penelit. Tek. Inform., vol. 9, no. 1, pp. 210–219, Jan. 2025, doi: 10.33395/sinkron.v9i1.14278.

[3] D. G. Nugroho, Y. H. Chrisnanto, and A. Wahana, “Analisis Sentimen Pada Jasa Ojek Online Menggunakan Metode Naïve Bayes,” Pros. Sains Nas. Dan Teknol., vol. 1, no. 1, Sept. 2016, doi: 10.36499/psnst.v1i1.1526.

[4] “Analisis Sentimen Berdasarkan Opini Pengguna pada Media Twitter Terhadap BPJS Menggunakan Metode Lexicon Based dan Naïve Bayes Classifier,” J. Ilm. Komputasi, vol. 20, no. 1, Mar. 2021, doi: 10.32409/jikstik.20.1.401.

[5] A. I. Tanggraeni and M. N. N. Sitokdana, “Analisis Sentimen Aplikasi E-Government pada Google Play Menggunakan Algoritma Naïve Bayes | JATISI,” June 2022, Accessed: Dec. 16, 2025. [Online]. Available: https://jurnal.mdp.ac.id/index.php/jatisi/article/view/1835

[6] O. Bellar, A. Baina, and M. Ballafkih, “Sentiment Analysis: Predicting Product Reviews for E-Commerce Recommendations Using Deep Learning and Transformers,” Mathematics, vol. 12, no. 15, p. 2403, Jan. 2024, doi: 10.3390/math12152403.

[7] V. Gooljar, T. Issa, S. Hardin-Ramanan, and B. Abu-Salih, “Sentiment-based predictive models for online purchases in the era of marketing 5.0: a systematic review,” J. Big Data, vol. 11, no. 1, p. 107, Aug. 2024, doi: 10.1186/s40537-024-00947-0.

[8] N. Nasrabadi, H. Wicaksono, and O. Fatahi Valilai, “Shopping marketplace analysis based on customer insights using social media analytics,” MethodsX, vol. 13, p. 102868, Dec. 2024, doi: 10.1016/j.mex.2024.102868.

[9] “Pengaruh Feature Selection Dan Feature Extraction Dalam Peningkatan Akurasi Klasifikasi Kebakaran Hutan | Armaya | Jurnal Teknologi Informasi.” Accessed: Dec. 16, 2025. [Online]. Available: https://ejournal.akakom.ac.id/index.php/JuTI/article/view/1039/pdf

[10] D. Setiawan, N. Umar, and M. A. Nur, “Feature Extraction Optimization to Improve Naïve Bayes Accuracy in Sentiment Analysis of Bulukumba Tourism Objects,” Sist. J. Sist. Inf., vol. 13, no. 5, pp. 2209–2221, Sept. 2024, doi: 10.32520/stmsi.v13i5.4580.

[11] A. Firdaus, A. Hadiana, and A. Ningsih, “Klasifikasi Sentimen pada Aplikasi Shopee Menggunakan Fitur Bag of Word dan Algoritma Random Forest,” Ranah Res. J. Multidiscip. Res. Dev., vol. 6, pp. 1678–1683, July 2024, doi: 10.38035/rrj.v6i5.994.

[12] C. Rosanti, F. A. Artanto, and R. E. Saputra, “Regresi Dengan Ekstrasi Fitur Neural Bag of Words Pada Analisis Sentimen Pengguna Aplikasi Bank Digital Syariah,” JIPI J. Ilm. Penelit. Dan Pembelajaran Inform., vol. 10, no. 3, pp. 2418–2425, Aug. 2025, doi: 10.29100/jipi.v10i3.6508.

[13] R. Al Rasyid and D. H. U. Ningsih, “Penerapan Algoritma TF-IDF dan Cosine Similarity untuk Query Pencarian Pada Dataset Destinasi Wisata,” J. JTIK J. Teknol. Inf. Dan Komun., vol. 8, no. 1, pp. 170–178, Jan. 2024, doi: 10.35870/jtik.v8i1.1416.

[14] M. Adha, F. Freddy, and F. Durrand, “Analisis Sentimen Review Film Menggunakan TF-IDF dan Support Vector Machine,” J. Inf. Technol., vol. 2, pp. 36–40, Mar. 2022, doi: 10.46229/jifotech.v2i1.330.

[15] S. M. P. Tyas, B. S. Rintyarna, and W. Suharso, “The Impact of Feature Extraction to Naïve Bayes Based Sentiment Analysis on Review Dataset of Indihome Services,” Digit. Zone J. Teknol. Inf. Dan Komun., vol. 13, no. 1, pp. 1–10, Apr. 2022, doi: 10.31849/digitalzone.v13i1.9158.

[16] “Analisis Sentimen Pelanggan Tokopedia Menggunakan Metode Naïve Bayes Classifier | Jurnal Minfo Polgan.” Accessed: Dec. 16, 2025. [Online]. Available: https://jurnal.polgan.ac.id/index.php/jmp/article/view/11640

[17] B. Darmawan and A. D. Laksito, “Analisis Perbandingan Ekstraksi Fitur Teks pada Sentimen Analisis Kenaikan Harga BBM”, Accessed: Dec. 16, 2025. [Online]. Available: https://core.ac.uk/works/150334040/

[18] R. Suryanti and P. Prasetyaningrum, “Perbandingan Metode TF-IDF dan Bag of Words dalam Analisis Sentimen Diet Kopi Americano di Media Sosial Twitter Menggunakan Naïve Bayes,” Build. Inform. Technol. Sci. BITS, vol. 7, no. 1, pp. 104–115, June 2025, doi: 10.47065/bits.v7i1.7244.

[19] A. Ernawati, A. O. Sari, S. N. Sofyan, M. Iqbal, and R. F. W. Wijaya, “Implementasi Algoritma Naïve Bayes dalam Menganalisis Sentimen Review Pengguna Tokopedia pada Produk Kesehatan,” Bull. Inf. Technol. BIT, vol. 4, no. 4, pp. 533–543, Dec. 2023, doi: 10.47065/bit.v4i4.1090.

[20] A. Gerliandeva, Y. Chrisnanto, and H. Ashaury, “Optimasi Klasifikasi Sentimen pada Komentar Online menggunakan Multinomial Naïve Bayes dan Ekstraksi Fitur TF-IDF serta N-gramsOptimization of Sentiment Classification on Online Comments using Multinomial Naïve Bayes and TF-IDF Feature Extraction and N-grams,” J. Pekommas, vol. 9, pp. 260–272, Dec. 2024, doi: 10.56873/jpkm.v9i2.5585.

[21] M. A. Palomino and F. Aider, “Evaluating the Effectiveness of Text Pre-Processing in Sentiment Analysis,” Appl. Sci., vol. 12, no. 17, p. 8765, Jan. 2022, doi: 10.3390/app12178765.

[22] R. Wati, S. Ernawati, and H. Rachmi, “Pembobotan TF-IDF Menggunakan Naïve Bayes pada Sentimen Masyarakat Mengenai Isu Kenaikan BIPIH,” J. Manaj. Inform. JAMIKA, vol. 13, no. 1, pp. 84–93, Apr. 2023, doi: 10.34010/jamika.v13i1.9424.

[23] M. Kamruzzaman and G. Kim, “Efficient Sentiment Analysis: A Resource-Aware Evaluation of Feature Extraction Techniques, Ensembling, and Deep Learning Models,” in Proceedings of the 11th International Workshop on Natural Language Processing for Social Media, L.-W. Ku and C.-T. Li, Eds., Bali, Indonesia: Association for Computational Linguistics, Nov. 2023, pp. 9–20. doi: 10.18653/v1/2023.socialnlp-1.2.

[24] N. P. Y. T. Wijayanti, E. N. Kencana, and I. W. Sumarjaya, “Smote: Potensi Dan Kekurangannya Pada Survei,” E-J. Mat., vol. 10, no. 4, pp. 235–240, Nov. 2021, doi: 10.24843/MTK.2021.v10.i04.p348.

[25] S. Maldonado, J. López, and C. Vairetti, “An alternative SMOTE oversampling strategy for high-dimensional datasets,” Appl. Soft Comput., vol. 76, pp. 380–389, Mar. 2019, doi: 10.1016/j.asoc.2018.12.024.

[26] A. Nurhopipah and C. Magnolia, “Perbandingan Metode Resampling Pada Imbalanced Dataset Untuk Klasifikasi Komentar Program Mbkm,” J. Publ. Ilmu Komput. Dan Multimed., vol. 2, no. 1, pp. 9–22, Jan. 2023, doi: 10.55606/jupikom.v2i1.862.

[27] H. Susana, “Penerapan Model Klasifikasi Metode Naive Bayes Terhadap Penggunaan Akses Internet,” J. Ris. Sist. Inf. Dan Teknol. Inf. JURSISTEKNI, vol. 4, no. 1, pp. 1–8, Feb. 2022, doi: 10.52005/jursistekni.v4i1.96.

[28] O. Rainio, J. Teuho, and R. Klén, “Evaluation metrics and statistical tests for machine learning,” Sci. Rep., vol. 14, Mar. 2024, doi: 10.1038/s41598-024-56706-x.

Downloads

Published

2026-02-04

How to Cite

[1]
A. Achmad and F. Budiman, “Optimizing Feature Extraction for Naïve Bayes Sentiment Analysis”, JAIC, vol. 10, no. 1, pp. 619–629, Feb. 2026.

Similar Articles

<< < 37 38 39 40 41 > >> 

You may also start an advanced similarity search for this article.