Detecting Fake Reviews in E-Commerce: A Case Study on Shopee Using Support Vector Machine and Random Forest
DOI:
https://doi.org/10.30871/jaic.v9i3.9514Keywords:
Fake Review Detection, NLP, Shopee, Random Forest, SVMAbstract
The increasing popularity of online shopping, particularly on platforms such as Shopee, has made product reviews a significant factor influencing consumer purchasing decisions. However, the presence of fake reviews generated by non-human agents undermines consumer trust and affects platform credibility. This study aims to detect fake reviews on Shopee by applying a text classification approach using Random Forest and Support Vector Machine (SVM) algorithms. A dataset consisting of 3,686 Shopee product reviews was collected and underwent preprocessing steps including data cleaning, normalization, tokenization, and TF-IDF weighting. Review labeling was performed automatically through the Latent Dirichlet Allocation (LDA) method, categorizing reviews into Original (OR) and Computer-Generated (CG). Model performance was evaluated using accuracy, precision, recall, and F1-score metrics. Experimental results show that the SVM algorithm achieved the highest accuracy at 88.84%, outperforming Random Forest which obtained 80.39%. These findings highlight the effectiveness of SVM in handling high-dimensional text data for fake review detection. The study contributes to the application of automated topic modeling (LDA) for labeling e-commerce reviews in the Indonesian context and opens opportunities for further enhancement using larger datasets and deep learning-based models to improve classification accuracy and scalability.
Downloads
References
[1] Z. Hadi, M. Zulpahmi, . Z., and A. Asrory, “Detecting Fake Reviews Using BERT and Sublinear_TF Methods on Hotel Reviews in the Lombok Tourism Area,” J. Appl. Informatics Comput., vol. 8, no. 2, pp. 550–556, Nov. 2024, doi: 10.30871/jaic.v8i2.8721.
[2] K. Mane, S. Dongre, and M. Madankar, “Fake Review Detection using Random Forest Classifier,” in 2025 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), IEEE, Jan. 2025, pp. 1–6. doi: 10.1109/SCEECS64059.2025.10940605.
[3] O. Singh, S. Singh, S. Rawat, and S. Nirati, “Fake Reviews Identification Using Deep Learning Techniques,” vol. 8, no. 2, pp. 820–827, 2025.
[4] S. Zabeen, A. Hasan, M. F. Islam, M. S. Hossain, and A. A. Rasel, “Robust Fake Review Detection Using Uncertainty-Aware LSTM and BERT,” in 2023 IEEE 15th International Conference on Computational Intelligence and Communication Networks (CICN), IEEE, Dec. 2023, pp. 786–791. doi: 10.1109/CICN59264.2023.10402342.
[5] H. Alamsyah, Y. Cahyana, and A. R. Pratama, “Deteksi Fake Review Menggunakan Metode Support Vector Machine dan Naïve Bayes Di Tokopedia,” Jutisi J. Ilm. Tek. Inform. dan Sist. Inf., vol. 12, no. 2, p. 585, Aug. 2023, doi: 10.35889/jutisi.v12i2.1222.
[6] Sugiyono, Metode Penelitian Kuantitatif, Kualitatif Dan R&D. 1967. [Online]. Available: https://www.academia.edu/118903676/Metode_Penelitian_Kuantitatif_Kualitatif_dan_R_and_D_Prof_Sugiono
[7] K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text Classification Algorithms: A Survey,” Information, vol. 10, no. 4, p. 150, Apr. 2019, doi: 10.3390/info10040150.
[8] M. Nurjannah and I. Fitri Astuti, “PENERAPAN ALGORITMA TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) UNTUK TEXT MINING Mahasiswa S1 Program Studi Ilmu Komputer FMIPA Universitas Mulawarman Dosen Program Studi Ilmu Komputer FMIPA Universitas Mulawarman,” J. Inform. Mulawarman, vol. 8, no. 3, pp. 110–113, 2013.
[9] A. Supriatman, “Pembobotan TF-IDF pada Judul Penelitian Dosen Sebagai Dasar Klasifikasi Menggunakan Algoritma K-NN (Studi Kasus: Universitas Siliwangi),” J. Serambi Eng., vol. 6, no. 1, pp. 1573–1579, 2021, doi: 10.32672/jse.v6i1.2645.
[10] T. Posangi, L. Yahya, and D. Wungguli, “Implementasi Algoritma Random Forest dengan Forward Selection untuk Klasifikasi Indeks Pembangunan Manusia,” Jambura J. Probab. Stat., vol. 4, no. 2, pp. 85–91, 2023, doi: 10.37905/jjps.v4i2.18460.
[11] I. Afdhal, R. Kurniawan, I. Iskandar, R. Salambue, E. Budianita, and F. Syafria, “Penerapan Algoritma Random Forest Untuk Analisis Sentimen Komentar Di YouTube Tentang Islamofobia,” J. Nas. Komputasi dan Teknol. Inf., vol. 5, no. 1, pp. 122–130, 2022, [Online]. Available: http://ojs.serambimekkah.ac.id/jnkti/article/view/4004/pdf
[12] A. Ramadhan, B. Susetyo, and Indahwati, “Penerapan Metode Klasifikasi Random Forest Dalam Mengidentifikasi Faktor Penting Penilaian Mutu Pendidikan,” J. Pendidik. dan Kebud., vol. 4, no. 2, pp. 169–182, 2019, doi: 10.24832/jpnk.v4i2.1327.
[13] K. Putri et al., “Implementasi Algoritma Support Vector Machine dalam Klasifikasi Deteksi Depresi dari Postingan pada Media Sosial,” J. Nas. Teknol. Inf. dan Apl., vol. 2, no. 1, pp. 193–202, 2023.
[14] W. A. Naseer, S. Sarwido, and B. B. Wahono, “Gradient Boosting Optimization with Pruning Technique for Prediction of Bmt Al-hikmah Permata Customer Data,” Jinteks, vol. 6, no. 3, pp. 719–727, 2024.
[15] K. Adib, M. R. Handayani, W. D. Yuniarti, and K. Umam, “Opini Publik Pasca-Pemilihan Presiden: Eksplorasi Analisis Sentimen Media Sosial X Menggunakan SVM,” SINTECH (Science Inf. Technol. J., vol. 7, no. 2, pp. 80–91, 2024, doi: 10.31598/sintechjournal.v7i2.1581.
[16] M. H. Aufan, M. R. Handayani, A. B. Nurjanna, and N. C. Hendro, “THE PERCEPTIONS OF SEMARANG FIVE STAR HOTEL TOURISTS WITH SUPPORT VECTOR MACHINE ON GOOGLE REVIEWS PERSEPSI WISATAWAN HOTEL BINTANG LIMA SEMARANG DENGAN,” vol. 5, no. 5, pp. 1241–1247, 2025.
[17] M. Apriliyani, M. I. Musyaffaq, S. Nur’Aini, M. R. Handayani, and K. Umam, “Implementasi analisis sentimen pada ulasan aplikasi Duolingo di Google Playstore menggunakan algoritma Naïve Bayes,” AITI, vol. 21, no. 2, pp. 298–311, Sep. 2024, doi: 10.24246/aiti.v21i2.298-311.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Khoirotulmuadiba Purifyregalia, Khothibul Umam, Nur Cahyo Hendro Wibowo, Maya Rini Handayani

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).