Optimizing Feature Extraction for Naïve Bayes Sentiment Analysis
DOI:
https://doi.org/10.30871/jaic.v10i1.12041Keywords:
Naïve Bayes, Sentiment Analysis, Feature Extraction, OptimizationAbstract
The rapid growth of e-commerce platforms such as Tokopedia has generated a large volume of user reviews containing diverse opinions about products and services. These reviews reflect consumer perceptions and provide valuable insights for business decision-making. This study aims to enhance sentiment analysis performance by optimizing the Naïve Bayes algorithm through a comparison of two feature extraction techniques, namely Bag of Words (BoW) and Term Frequency–Inverse Document Frequency (TF-IDF). The dataset consists of 5,400 Tokopedia product reviews obtained from the Kaggle platform, which are categorized into positive and negative sentiments. The research process includes text preprocessing consisting of text cleaning, case folding, tokenization, stopword removal, and stemming, feature extraction using Bag of Words (BoW) and Term Frequency–Inverse Document Frequency (TF-IDF), handling data imbalance using the Synthetic Minority Over-sampling Technique (SMOTE), and model training using the Naïve Bayes. The dataset is divided into 80% training data and 20% testing data, and model performance is evaluated using accuracy, precision, recall, and F1-score. The results show that BoW achieved the highest accuracy of 93%, while TF-IDF reached 83%, indicating that BoW provides more effective feature representation and more stable performance for Naïve Bayes-based sentiment analysis on this dataset.
Downloads
References
[1] Y. Putri, R. Kusumadewi, and E. Saefulloh, “Pengaruh Kredibilitas Influencer dan Brand Awarness Terhadap Minat Pembelian di Tokopedia (studi Pada Pelanggan Tokopedia Yang Bertransaksi Melalui Bank Syariah Indonesia),” Entrep. J. Bisnis Manaj. Dan Kewirausahaan, vol. 4, pp. 205–225, May 2023, doi: 10.31949/entrepreneur.v4i2.5651.
[2] M. Idris, A. Rifai, and K. D. Tania, “Sentiment Analysis of Tokopedia App Reviews using Machine Learning and Word Embeddings,” Sink. J. Dan Penelit. Tek. Inform., vol. 9, no. 1, pp. 210–219, Jan. 2025, doi: 10.33395/sinkron.v9i1.14278.
[3] D. G. Nugroho, Y. H. Chrisnanto, and A. Wahana, “Analisis Sentimen Pada Jasa Ojek Online Menggunakan Metode Naïve Bayes,” Pros. Sains Nas. Dan Teknol., vol. 1, no. 1, Sept. 2016, doi: 10.36499/psnst.v1i1.1526.
[4] “Analisis Sentimen Berdasarkan Opini Pengguna pada Media Twitter Terhadap BPJS Menggunakan Metode Lexicon Based dan Naïve Bayes Classifier,” J. Ilm. Komputasi, vol. 20, no. 1, Mar. 2021, doi: 10.32409/jikstik.20.1.401.
[5] A. I. Tanggraeni and M. N. N. Sitokdana, “Analisis Sentimen Aplikasi E-Government pada Google Play Menggunakan Algoritma Naïve Bayes | JATISI,” June 2022, Accessed: Dec. 16, 2025. [Online]. Available: https://jurnal.mdp.ac.id/index.php/jatisi/article/view/1835
[6] O. Bellar, A. Baina, and M. Ballafkih, “Sentiment Analysis: Predicting Product Reviews for E-Commerce Recommendations Using Deep Learning and Transformers,” Mathematics, vol. 12, no. 15, p. 2403, Jan. 2024, doi: 10.3390/math12152403.
[7] V. Gooljar, T. Issa, S. Hardin-Ramanan, and B. Abu-Salih, “Sentiment-based predictive models for online purchases in the era of marketing 5.0: a systematic review,” J. Big Data, vol. 11, no. 1, p. 107, Aug. 2024, doi: 10.1186/s40537-024-00947-0.
[8] N. Nasrabadi, H. Wicaksono, and O. Fatahi Valilai, “Shopping marketplace analysis based on customer insights using social media analytics,” MethodsX, vol. 13, p. 102868, Dec. 2024, doi: 10.1016/j.mex.2024.102868.
[9] “Pengaruh Feature Selection Dan Feature Extraction Dalam Peningkatan Akurasi Klasifikasi Kebakaran Hutan | Armaya | Jurnal Teknologi Informasi.” Accessed: Dec. 16, 2025. [Online]. Available: https://ejournal.akakom.ac.id/index.php/JuTI/article/view/1039/pdf
[10] D. Setiawan, N. Umar, and M. A. Nur, “Feature Extraction Optimization to Improve Naïve Bayes Accuracy in Sentiment Analysis of Bulukumba Tourism Objects,” Sist. J. Sist. Inf., vol. 13, no. 5, pp. 2209–2221, Sept. 2024, doi: 10.32520/stmsi.v13i5.4580.
[11] A. Firdaus, A. Hadiana, and A. Ningsih, “Klasifikasi Sentimen pada Aplikasi Shopee Menggunakan Fitur Bag of Word dan Algoritma Random Forest,” Ranah Res. J. Multidiscip. Res. Dev., vol. 6, pp. 1678–1683, July 2024, doi: 10.38035/rrj.v6i5.994.
[12] C. Rosanti, F. A. Artanto, and R. E. Saputra, “Regresi Dengan Ekstrasi Fitur Neural Bag of Words Pada Analisis Sentimen Pengguna Aplikasi Bank Digital Syariah,” JIPI J. Ilm. Penelit. Dan Pembelajaran Inform., vol. 10, no. 3, pp. 2418–2425, Aug. 2025, doi: 10.29100/jipi.v10i3.6508.
[13] R. Al Rasyid and D. H. U. Ningsih, “Penerapan Algoritma TF-IDF dan Cosine Similarity untuk Query Pencarian Pada Dataset Destinasi Wisata,” J. JTIK J. Teknol. Inf. Dan Komun., vol. 8, no. 1, pp. 170–178, Jan. 2024, doi: 10.35870/jtik.v8i1.1416.
[14] M. Adha, F. Freddy, and F. Durrand, “Analisis Sentimen Review Film Menggunakan TF-IDF dan Support Vector Machine,” J. Inf. Technol., vol. 2, pp. 36–40, Mar. 2022, doi: 10.46229/jifotech.v2i1.330.
[15] S. M. P. Tyas, B. S. Rintyarna, and W. Suharso, “The Impact of Feature Extraction to Naïve Bayes Based Sentiment Analysis on Review Dataset of Indihome Services,” Digit. Zone J. Teknol. Inf. Dan Komun., vol. 13, no. 1, pp. 1–10, Apr. 2022, doi: 10.31849/digitalzone.v13i1.9158.
[16] “Analisis Sentimen Pelanggan Tokopedia Menggunakan Metode Naïve Bayes Classifier | Jurnal Minfo Polgan.” Accessed: Dec. 16, 2025. [Online]. Available: https://jurnal.polgan.ac.id/index.php/jmp/article/view/11640
[17] B. Darmawan and A. D. Laksito, “Analisis Perbandingan Ekstraksi Fitur Teks pada Sentimen Analisis Kenaikan Harga BBM”, Accessed: Dec. 16, 2025. [Online]. Available: https://core.ac.uk/works/150334040/
[18] R. Suryanti and P. Prasetyaningrum, “Perbandingan Metode TF-IDF dan Bag of Words dalam Analisis Sentimen Diet Kopi Americano di Media Sosial Twitter Menggunakan Naïve Bayes,” Build. Inform. Technol. Sci. BITS, vol. 7, no. 1, pp. 104–115, June 2025, doi: 10.47065/bits.v7i1.7244.
[19] A. Ernawati, A. O. Sari, S. N. Sofyan, M. Iqbal, and R. F. W. Wijaya, “Implementasi Algoritma Naïve Bayes dalam Menganalisis Sentimen Review Pengguna Tokopedia pada Produk Kesehatan,” Bull. Inf. Technol. BIT, vol. 4, no. 4, pp. 533–543, Dec. 2023, doi: 10.47065/bit.v4i4.1090.
[20] A. Gerliandeva, Y. Chrisnanto, and H. Ashaury, “Optimasi Klasifikasi Sentimen pada Komentar Online menggunakan Multinomial Naïve Bayes dan Ekstraksi Fitur TF-IDF serta N-gramsOptimization of Sentiment Classification on Online Comments using Multinomial Naïve Bayes and TF-IDF Feature Extraction and N-grams,” J. Pekommas, vol. 9, pp. 260–272, Dec. 2024, doi: 10.56873/jpkm.v9i2.5585.
[21] M. A. Palomino and F. Aider, “Evaluating the Effectiveness of Text Pre-Processing in Sentiment Analysis,” Appl. Sci., vol. 12, no. 17, p. 8765, Jan. 2022, doi: 10.3390/app12178765.
[22] R. Wati, S. Ernawati, and H. Rachmi, “Pembobotan TF-IDF Menggunakan Naïve Bayes pada Sentimen Masyarakat Mengenai Isu Kenaikan BIPIH,” J. Manaj. Inform. JAMIKA, vol. 13, no. 1, pp. 84–93, Apr. 2023, doi: 10.34010/jamika.v13i1.9424.
[23] M. Kamruzzaman and G. Kim, “Efficient Sentiment Analysis: A Resource-Aware Evaluation of Feature Extraction Techniques, Ensembling, and Deep Learning Models,” in Proceedings of the 11th International Workshop on Natural Language Processing for Social Media, L.-W. Ku and C.-T. Li, Eds., Bali, Indonesia: Association for Computational Linguistics, Nov. 2023, pp. 9–20. doi: 10.18653/v1/2023.socialnlp-1.2.
[24] N. P. Y. T. Wijayanti, E. N. Kencana, and I. W. Sumarjaya, “Smote: Potensi Dan Kekurangannya Pada Survei,” E-J. Mat., vol. 10, no. 4, pp. 235–240, Nov. 2021, doi: 10.24843/MTK.2021.v10.i04.p348.
[25] S. Maldonado, J. López, and C. Vairetti, “An alternative SMOTE oversampling strategy for high-dimensional datasets,” Appl. Soft Comput., vol. 76, pp. 380–389, Mar. 2019, doi: 10.1016/j.asoc.2018.12.024.
[26] A. Nurhopipah and C. Magnolia, “Perbandingan Metode Resampling Pada Imbalanced Dataset Untuk Klasifikasi Komentar Program Mbkm,” J. Publ. Ilmu Komput. Dan Multimed., vol. 2, no. 1, pp. 9–22, Jan. 2023, doi: 10.55606/jupikom.v2i1.862.
[27] H. Susana, “Penerapan Model Klasifikasi Metode Naive Bayes Terhadap Penggunaan Akses Internet,” J. Ris. Sist. Inf. Dan Teknol. Inf. JURSISTEKNI, vol. 4, no. 1, pp. 1–8, Feb. 2022, doi: 10.52005/jursistekni.v4i1.96.
[28] O. Rainio, J. Teuho, and R. Klén, “Evaluation metrics and statistical tests for machine learning,” Sci. Rep., vol. 14, Mar. 2024, doi: 10.1038/s41598-024-56706-x.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Achmad Achmad, Fikri Budiman

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








