Performance of Machine Learning Algorithms on Imbalanced Sentiment Datasets Without Balancing Techniques

Authors

  • Dina Wulan Yekti rahayu UIN Walisongo Semarang
  • Khothibul Umam UIN Walisongo Semarang
  • Maya Rini Handayani UIN Walisongo Semarang

DOI:

https://doi.org/10.30871/jaic.v9i3.9584

Keywords:

Machine Learning, Imbalanced Datasets, Sentiment Analysis, SMOTE

Abstract

This study explores the performance of five sentiment classification algorithms—Naïve Bayes, Logistic Regression, Support Vector Machine, Decision Tree, and Random Forest—on an imbalanced sentiment dataset, with the SMOTE technique applied as a comparison. The research follows the Knowledge Discovery in Databases (KDD) framework, which includes data selection, preprocessing, transformation, data mining, and evaluation. The evaluation uses metrics such as accuracy, precision, recall, F1-score, and macro average F1-score. Initial results show that all five algorithms performed fairly well even without using a balancing technique, with Naïve Bayes achieving the highest F1-score of 0.84 and recall of 0.81. After applying SMOTE, only small improvements were observed in some models, such as Random Forest (F1-score increased from 0.81 to 0.85), while other models like Naïve Bayes experienced a decrease in performance, dropping to 0.77. This suggests that the effect of balancing techniques like SMOTE can vary depending on the algorithm. Thus, this study provides empirical contributions that highlight the importance of selecting appropriate approaches and the need for a deep understanding of each algorithm's behavior in the context of imbalanced data. Researchers are encouraged to carefully consider these aspects when designing experiments and interpreting results.

Downloads

Download data is not yet available.

Author Biographies

Khothibul Umam, UIN Walisongo Semarang

Teknologi Informasi, Fakultas Sains dan Teknologi, UIN Walisongo Semarang

Maya Rini Handayani, UIN Walisongo Semarang

Komunikasi dan Penyiaran Islam, Fakultas Dakwah dan Komunikasi, UIN Walisongo Semarang

References

[1] Chawla, N. V. (2010). Data Mining and Knowledge Discovery Handbook. Data Mining and Knowledge Discovery Handbook, 2003–2004. https://doi.org/10.1007/978-0-387-09823-4

[2] Liawati, A., Narasati, R., Solihudin, D., Lukman Rohmat, C., & Eka Permana, S. (2024). Analisis Sentimen Komentar Politik Di Media Sosial X Dengan Pendekataan Deep Learning. JATI (Jurnal Mahasiswa Teknik Informatika), 7(6), 3557–3563.

[3] Sun, H., Li, J., & Zhu, X. (2025). A novel expandable borderline SMOTE over-sampling method for class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 37(5), 2183-2199.

[4] Pebrianti, R. D., & Karawang, U. S. (2025). Analisis Sentimen Masyarakat Platform X. 13(2).

[5] Saputra, A. (2025). Analisis Sentimen Pengguna X Terhadap Kebocoran Data Pribadi Menggunakan Algoritma Naïve Bayes Classifier. 14(01), 32–40.

[6] Pratiwi, F. S., Barata, M. A., Ardianti, A. D., Studi, P., Infomatika, T., Nahdlatul, U., Sunan, U., Studi, P., Mesin, T., Nahdlatul, U., & Sunan, U. (2025). Implementasi Metode Smote Dan Random Over- Sampling Pada Algoritma Machine Learning Untuk. 8(1), 87–98.

[7] Damari, A., Azhima, T., Siswa, Y., Pranoto, W. J., Muhammadiyah, U., Timur, K., & Timur, K. (2025). Implementation Of The Pso-Smote Method On The Naive Bayes Algorithm To Address Class Imbalance In Landslide Disaster Data Penerapan Metode Pso-Smote Pada Algoritma Naive Bayes Untuk Mengatasi Class Imbalance. 10(1), 332–343.

[8] Gusti, I., Bagus, N., & Adnyana, P. (2025). Prediksi Mahasiswa Mengundurkan Diri Menggunakan Metode Support Vector Machine. 12(1), 1793–1800.

[9] Gullo, F. (2015). From patterns in data to knowledge discovery: What data mining can do. Physics Procedia, 62, 18–22. https://doi.org/10.1016/j.phpro.2015.02.005

[10] Kotsiantis, S. B., Kanellopoulos, D., & Pintelas, P. E. (2006). Data Preprocessing for Supervised Learning.

International Journal of Computer Science, 1(2), 111–117.

[11] Kaur, H., & Chhabra, R. (2021). Data mining and its applications in various sectors: A comprehensive review. Materials Today: Proceedings, 47, 3435–3439.

[12] Saputra, M. R. (2025). Analisis sentimen twitter terhadap konflik di papua menggunakan perbandingan naive bayes dan svm. 10(2), 1197–1208.

[13] Shidiq, M. F. A., & Alita, D. (2025). Kasus Judi Online Menggunakan Data Dari Media Sosial X Pendekatan Naive Bayes Dan Svm. 8(1), 24–35.

[14] No, V., Hal, J., Kurniasih, U., & Teguh, A. (2025). Analisis Sentimen Masyarakat Terhadap Isu Migrasi Rohingya Ke Indonesia. 7(1), 199–207.

[15] Jhosefhin, N. V. R., & Dewi, C. (2025). Analisis Sentimen Crawling Data dari Sosial Media X tentang Gaza Menggunakan Metode SVM dan Decision Tree. 6(1), 427–437.

[16] Fakultas Hukum UMSU. (2025, Februari 7). Isi dan makna RUU TNI terbaru yang sudah direvisi 2025. Universitas Muhammadiyah Sumatera Utara. https://fahum.umsu.ac.id/berita/isi-dan-makna-ruu-tni-terbaru-yang-sudah-direvisi-2025/

[17] Tempo.co. (2019, Juli 12). Kontroversi pelibatan tentara hadapi ancaman siber dalam UU TNI. Tempo. https://www.tempo.co/politik/kontroversi-pelibatan-tentara-hadapi-ancaman-siber-dalam-uu-tni-1224841

[18] Awan, A. A. (2023, Maret 3). Naive Bayes Classifier Tutorial: with Python Scikit-learn. DataCamp. https://www.datacamp.com/tutorial/naive-bayes-scikit-learn

[19] Adib, K., Handayani, M. R., Yuniarti, W. D., & Umam, K. (2024). Opini Publik Pasca-Pemilihan Presiden: Eksplorasi Analisis Sentimen Media Sosial X Menggunakan SVM. SINTECH (Science and Information Technology) Journal, 7(2), 80–91. https://doi.org/10.31598/sintechjournal.v7i2.1581

[20] Aufan, M. H., Handayani, M. R., Nurjanna, A. B., & Hendro, N. C. (2023). The Perceptions Of Semarang Five Star Hotel Tourists With Support Vector Machine On Google Reviews Persepsi Wisatawan Hotel Bintang Lima Semarang Dengan. x(December), 1–8.

[21] No, V., & A, F. Y. (2025). Edumatic : Jurnal Pendidikan Informatika Optimasi Klasifikasi Sentimen Ulasan Game Berbahasa Indonesia : IndoBERT dan SMOTE untuk Menangani Ketidakseimbangan Kelas. 9(1), 256–265. https://doi.org/10.29408/edumatic.v9i1.29666

[22] Hawari, F. A., Sholihati, I. D., Informasi, S., Nasional, U., Siswa, K., & Neighbors, K. (2025). Perbandingan Metode Naïve Bayes Dan K-Nearest Neighbors Dalam Klasifikasi Kepuasan Mahasiswa Terhadap Layanan Wi-Fi Di Universitas Nasional. 9(3), 5203–5208.

[23] Dwilestari, G., & Afifah, T. A. (2025). Perbandingan Kinerja Algoritma Naive Bayes Dan Decision Tree Dalam Klasifikasi Kanker Paru-Paru. 9(1), 801–807.

[24] Baja;, P. R., & Ani, A. S. (2020). Jurnal Comasie. Comasie, 6(2), 107–118.

[25] Fauzan, R., Vitianingsih, A. V., & Cahyono, D. (2025). Application of Classification Algorithms in Machine Learning for Phishing Detection Penerapan Algoritma Klasifikasi pada Machine Learning untuk Deteksi Phishing. 5(April), 531–540.

Downloads

Published

2025-06-20

How to Cite

[1]
Dina Wulan Yekti rahayu, Khothibul Umam, and Maya Rini Handayani, “Performance of Machine Learning Algorithms on Imbalanced Sentiment Datasets Without Balancing Techniques”, JAIC, vol. 9, no. 3, pp. 998–1005, Jun. 2025.

Issue

Section

Articles

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.