Performance of Machine Learning Algorithms on Imbalanced Sentiment Datasets Without Balancing Techniques
DOI:
https://doi.org/10.30871/jaic.v9i3.9584Keywords:
Machine Learning, Imbalanced Datasets, Sentiment Analysis, SMOTEAbstract
This study explores the performance of five sentiment classification algorithms—Naïve Bayes, Logistic Regression, Support Vector Machine, Decision Tree, and Random Forest—on an imbalanced sentiment dataset, with the SMOTE technique applied as a comparison. The research follows the Knowledge Discovery in Databases (KDD) framework, which includes data selection, preprocessing, transformation, data mining, and evaluation. The evaluation uses metrics such as accuracy, precision, recall, F1-score, and macro average F1-score. Initial results show that all five algorithms performed fairly well even without using a balancing technique, with Naïve Bayes achieving the highest F1-score of 0.84 and recall of 0.81. After applying SMOTE, only small improvements were observed in some models, such as Random Forest (F1-score increased from 0.81 to 0.85), while other models like Naïve Bayes experienced a decrease in performance, dropping to 0.77. This suggests that the effect of balancing techniques like SMOTE can vary depending on the algorithm. Thus, this study provides empirical contributions that highlight the importance of selecting appropriate approaches and the need for a deep understanding of each algorithm's behavior in the context of imbalanced data. Researchers are encouraged to carefully consider these aspects when designing experiments and interpreting results.
Downloads
References
[1] Chawla, N. V. (2010). Data Mining and Knowledge Discovery Handbook. Data Mining and Knowledge Discovery Handbook, 2003–2004. https://doi.org/10.1007/978-0-387-09823-4
[2] Liawati, A., Narasati, R., Solihudin, D., Lukman Rohmat, C., & Eka Permana, S. (2024). Analisis Sentimen Komentar Politik Di Media Sosial X Dengan Pendekataan Deep Learning. JATI (Jurnal Mahasiswa Teknik Informatika), 7(6), 3557–3563.
[3] Sun, H., Li, J., & Zhu, X. (2025). A novel expandable borderline SMOTE over-sampling method for class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 37(5), 2183-2199.
[4] Pebrianti, R. D., & Karawang, U. S. (2025). Analisis Sentimen Masyarakat Platform X. 13(2).
[5] Saputra, A. (2025). Analisis Sentimen Pengguna X Terhadap Kebocoran Data Pribadi Menggunakan Algoritma Naïve Bayes Classifier. 14(01), 32–40.
[6] Pratiwi, F. S., Barata, M. A., Ardianti, A. D., Studi, P., Infomatika, T., Nahdlatul, U., Sunan, U., Studi, P., Mesin, T., Nahdlatul, U., & Sunan, U. (2025). Implementasi Metode Smote Dan Random Over- Sampling Pada Algoritma Machine Learning Untuk. 8(1), 87–98.
[7] Damari, A., Azhima, T., Siswa, Y., Pranoto, W. J., Muhammadiyah, U., Timur, K., & Timur, K. (2025). Implementation Of The Pso-Smote Method On The Naive Bayes Algorithm To Address Class Imbalance In Landslide Disaster Data Penerapan Metode Pso-Smote Pada Algoritma Naive Bayes Untuk Mengatasi Class Imbalance. 10(1), 332–343.
[8] Gusti, I., Bagus, N., & Adnyana, P. (2025). Prediksi Mahasiswa Mengundurkan Diri Menggunakan Metode Support Vector Machine. 12(1), 1793–1800.
[9] Gullo, F. (2015). From patterns in data to knowledge discovery: What data mining can do. Physics Procedia, 62, 18–22. https://doi.org/10.1016/j.phpro.2015.02.005
[10] Kotsiantis, S. B., Kanellopoulos, D., & Pintelas, P. E. (2006). Data Preprocessing for Supervised Learning.
International Journal of Computer Science, 1(2), 111–117.
[11] Kaur, H., & Chhabra, R. (2021). Data mining and its applications in various sectors: A comprehensive review. Materials Today: Proceedings, 47, 3435–3439.
[12] Saputra, M. R. (2025). Analisis sentimen twitter terhadap konflik di papua menggunakan perbandingan naive bayes dan svm. 10(2), 1197–1208.
[13] Shidiq, M. F. A., & Alita, D. (2025). Kasus Judi Online Menggunakan Data Dari Media Sosial X Pendekatan Naive Bayes Dan Svm. 8(1), 24–35.
[14] No, V., Hal, J., Kurniasih, U., & Teguh, A. (2025). Analisis Sentimen Masyarakat Terhadap Isu Migrasi Rohingya Ke Indonesia. 7(1), 199–207.
[15] Jhosefhin, N. V. R., & Dewi, C. (2025). Analisis Sentimen Crawling Data dari Sosial Media X tentang Gaza Menggunakan Metode SVM dan Decision Tree. 6(1), 427–437.
[16] Fakultas Hukum UMSU. (2025, Februari 7). Isi dan makna RUU TNI terbaru yang sudah direvisi 2025. Universitas Muhammadiyah Sumatera Utara. https://fahum.umsu.ac.id/berita/isi-dan-makna-ruu-tni-terbaru-yang-sudah-direvisi-2025/
[17] Tempo.co. (2019, Juli 12). Kontroversi pelibatan tentara hadapi ancaman siber dalam UU TNI. Tempo. https://www.tempo.co/politik/kontroversi-pelibatan-tentara-hadapi-ancaman-siber-dalam-uu-tni-1224841
[18] Awan, A. A. (2023, Maret 3). Naive Bayes Classifier Tutorial: with Python Scikit-learn. DataCamp. https://www.datacamp.com/tutorial/naive-bayes-scikit-learn
[19] Adib, K., Handayani, M. R., Yuniarti, W. D., & Umam, K. (2024). Opini Publik Pasca-Pemilihan Presiden: Eksplorasi Analisis Sentimen Media Sosial X Menggunakan SVM. SINTECH (Science and Information Technology) Journal, 7(2), 80–91. https://doi.org/10.31598/sintechjournal.v7i2.1581
[20] Aufan, M. H., Handayani, M. R., Nurjanna, A. B., & Hendro, N. C. (2023). The Perceptions Of Semarang Five Star Hotel Tourists With Support Vector Machine On Google Reviews Persepsi Wisatawan Hotel Bintang Lima Semarang Dengan. x(December), 1–8.
[21] No, V., & A, F. Y. (2025). Edumatic : Jurnal Pendidikan Informatika Optimasi Klasifikasi Sentimen Ulasan Game Berbahasa Indonesia : IndoBERT dan SMOTE untuk Menangani Ketidakseimbangan Kelas. 9(1), 256–265. https://doi.org/10.29408/edumatic.v9i1.29666
[22] Hawari, F. A., Sholihati, I. D., Informasi, S., Nasional, U., Siswa, K., & Neighbors, K. (2025). Perbandingan Metode Naïve Bayes Dan K-Nearest Neighbors Dalam Klasifikasi Kepuasan Mahasiswa Terhadap Layanan Wi-Fi Di Universitas Nasional. 9(3), 5203–5208.
[23] Dwilestari, G., & Afifah, T. A. (2025). Perbandingan Kinerja Algoritma Naive Bayes Dan Decision Tree Dalam Klasifikasi Kanker Paru-Paru. 9(1), 801–807.
[24] Baja;, P. R., & Ani, A. S. (2020). Jurnal Comasie. Comasie, 6(2), 107–118.
[25] Fauzan, R., Vitianingsih, A. V., & Cahyono, D. (2025). Application of Classification Algorithms in Machine Learning for Phishing Detection Penerapan Algoritma Klasifikasi pada Machine Learning untuk Deteksi Phishing. 5(April), 531–540.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Dina Wulan Yekti rahayu, Khothibul Umam; Maya Rini Handayani

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).