Detecting Fake Reviews Using BERT and Sublinear_TF Methods on Hotel Reviews in the Lombok Tourism Area
Abstract
The number of visitors to Lombok, one of the famous tourist destinations in Indonesia, increased from 400,595 in 2020 to 1,376,295 in 2022. Although the government supports the hotel industry, fake reviews are a significant problem that can damage hotel reputations and mislead tourists. This study uses BERT and Sublinear_TF feature extraction techniques to analyze fake reviews from three main areas: Gili Trawangan, Senggigi, and Kuta. BERT detects fake reviews by understanding the context of words, while Sublinear_TF emphasizes more informative words by reducing the weight of irrelevant common words. The results showed that the more extensive and diverse dataset from Gili Trawangan had the best classification results. The combination of BERT and Random Forest achieved the highest accuracy of 0.84. Overall, BERT excels in Gili Trawangan with an accuracy of 0.79 for SVM and 0.84 for Random Forest. In contrast, smaller and more homogeneous datasets such as Senggigi and Kuta have lower accuracy. In addition, Sublinear_TF performed well on Gili Trawangan with an accuracy of 0.82 using SVM and 0.83 using Random Forest; however, its performance declined in Senggigi and Kuta. BERT and Sublinear_TF techniques are more effective on large and diverse datasets such as Gili Trawangan. Sublinear_TF is better for varied data but less effective on more homogeneous datasets, while BERT with Random Forest showed the highest accuracy due to its ability to capture broader language context. This suggests that the size and variety of the dataset highly influence the success of fake review classification techniques.
Downloads
References
Dinas Pariwisata NTB, “Jumlah Kunjungan Wisatawan ke Provinsi Nusa Tenggara Barat (NTB) | Satu Data NTB,” Ntbprov.Go.Id. [Online]. Available: file:///E:/POLTEKPAR/PROYEK AKHIR/Jumlah Kunjungan Wisatawan ke Provinsi Nusa Tenggara Barat (NTB) _ Satu Data NTB.html
G. S. Budhi, R. Chiong, and Z. Wang, “Resampling imbalanced data to detect fake reviews using machine learning classifiers and textual-based features,” Multimed. Tools Appl., vol. 80, pp. 13079–13097, 2021.
R. Barbado, O. Araque, and C. A. Iglesias, “A framework for fake review detection in online consumer electronics retailers,” Inf. Process. Manag., vol. 56, no. 4, pp. 1234–1244, 2019, doi: 10.1016/j.ipm.2019.03.002.
Z. Hadi, E. Utami, and D. Ariatmanto, “Detect Fake Reviews Using Random Forest and Support Vector Machine,” SinkrOn, vol. 8, no. 2, pp. 623–630, 2023, doi: 10.33395/sinkron.v8i2.12090.
Z. Hadi and S. Andi, “Detecting Fake Reviews Using N-gram Model and Chi-Square,” 2023 6th Int. Conf. Inf. Commun. Technol., 2023, doi: 10.1109/ICOIACT59844.2023.10455895.
R. Mohawesh et al., “Fake Reviews Detection: A Survey,” IEEE Access, vol. 9, pp. 65771–65802, 2021, doi: 10.1109/ACCESS.2021.3075573.
M. Abdulqader, A. Namoun, and Y. Alsaawy, “Fake Online Reviews: A Unified Detection Model Using Deception Theories,” IEEE, vol. 10, pp. 128622–128655, 2022, doi: 10.1109/ACCESS.2022.3227631.
A. Ahmed, I. Bacho, and S. Talpur, “Identification of Real and Fake Reviews Written in Roman Urdu,” vol. 5, no. 4, pp. 787–797, 2023.
A. Q. Mir, F. Y. Khan, and M. A. Chishti, “Online Fake Review Detection Using Supervised Machine Learning And BERT Model,” Comput. Lang., 2023.
M. Ott, C. Cardie, and J. T. Hancock, “Negative deceptive opinion spam,” NAACL HLT 2013 - 2013 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Main Conf., no. June, pp. 497–501, 2013.
J. K. Rout, A. Dalmia, K. K. R. Choo, S. Bakshi, and S. K. Jena, “Revisiting semi-supervised learning for online deceptive review detection,” IEEE Access, vol. 5, pp. 1319–1327, 2017, doi: 10.1109/ACCESS.2017.2655032.
R. Hassan and M. R. Islam, “Detection of fake online reviews using semi-supervised and supervised learning,” 2nd Int. Conf. Electr. Comput. Commun. Eng. ECCE 2019, pp. 1–5, 2019, doi: 10.1109/ECACE.2019.8679186.
J. Piskorski and G. Jacquet, “TF-IDF Character N-grams versus Word Embedding-based Models for Fine-grained Event Classification: A Preliminary Study,” Proc. Work. Autom. Extr. Socio-political Events from News 2020, no. May, pp. 26–34, 2020.
M. S. Isa, “Penerapan Algoritma BERT dalam Search Engine Google,” Master of Computer Science. Accessed: Sep. 17, 2024. [Online]. Available: https://mti.binus.ac.id/2020/09/03/penerapan-algoritma-bert-dalam-search-engine-google/
M. Mozafari, R. Farahbakhsh, and N. Crespi, “A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media,” Int. Conf. Complex Networks Their Appl., vol. 881, 2019, doi: https://doi.org/10.1007/978-3-030-36687-2_77 .
K. Florio, V. Basile, M. Polignano, P. Basile, and V. Patti, “Time of your hate: The challenge of time in hate speech detection on social media,” Appl. Sci., vol. 10, no. 12, 2020, doi: 10.3390/APP10124180.
G. R. Ditami, E. F. Ripanti, and H. Sujaini, “Implementasi Support Vector Machine untuk Analisis Sentimen Terhadap Pengaruh Program Promosi Event Belanja pada Marketplace,” J. Edukasi dan Penelit. Inform., vol. 8, no. 3, p. 508, 2022, doi: 10.26418/jp.v8i3.56478.
Y. X. Chu, X. G. Liu, and C. H. Gao, “Multiscale models on time series of silicon content in blast furnace hot metal based on Hilbert-Huang transform,” Proc. 2011 Chinese Control Decis. Conf. CCDC 2011, pp. 842–847, 2011, doi: 10.1109/CCDC.2011.5968300.
I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques. 2011. doi: https://doi.org/10.1016/C2009-0-19715-5.
K. Dinas et al., “Prediksi Jumlah Penggunaan BBM Perbulan Menggunakan Algoritma Decition Tree (C4.5) Pada,” J. Inform. dan Teknol., vol. 1, no. 1, pp. 56–63, 2018.
L. T. E. . Kusrini, Algoritma Data Mining. Buku Algoritma Data Mining, I. Yogyakarta: C.V ANDI, 2009. [Online]. Available: https://books.google.co.id/books?id=-Ojclag73O8C&printsec=frontcover&hl=id#v=onepage&q&f=false
Copyright (c) 2024 Zulpan Hadi, M. Zulpahmi, Zaenudin ., Akmaludin Asrory
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).