Comparative Sentiment Analysis on Mobile JKN Application Using Logistic Regression with SMOTE Based Statistical Feature Selection
DOI:
https://doi.org/10.30871/jaic.v9i6.10520Keywords:
sentiment analysis, public health application, Logistic Regression, TF-IDF, Word2Vec, Feature Selection, Mobile JKNAbstract
This study investigates public sentiment on the Mobile JKN application using Logistic Regression enhanced with SMOTE-based statistical feature selection. Unlike prior works that relied solely on conventional feature combinations such as TF-IDF or Word2Vec, this research performs a comparative evaluation of three statistical feature selection techniques: Recursive Feature Elimination (RFE), Chi-Square, and Mutual Information, under both TF-IDF and Word2Vec representations in a low-resource Indonesian language setting. The dataset consists of 2,382 user reviews from the Google Play Store, balanced using SMOTE to mitigate class imbalance. The best configuration, TF-IDF combined with Mutual Information, achieved an accuracy of 73.38% and an F1-score of 50%, indicating a moderate yet consistent performance. A confusion matrix-based error analysis revealed that most misclassifications occurred between neutral and negative classes due to semantic overlap. The relatively low F1-score highlights challenges in sentiment separability, while the superior performance of Mutual Information demonstrates its ability to capture discriminative linguistic features. The superior performance of Mutual Information is attributed to its ability to capture non-linear dependencies between features and sentiment labels, yielding richer discriminative information compared to Chi-Square or RFE. This research establishes a comparative methodological framework that integrates feature selection and data balancing techniques, providing interpretable sentiment classification insights for under-resourced language settings.
Downloads
References
[1] Y. D. et al. Mai, Thanh; M, Shahbaz; Tong, “Pr ep rin t n ot pe er r iew Pr ep rin t n ot pe er ed,” Fusion, pp. 1–8, 2023, doi: 10.2139/ssrn.5277059.
[2] Renny, Harmendo, and D. Kusmadeni, “Analisis Transformasi Digital BPJS Kesehatan Dalam Mendukung Mutu Layanan Jaminan Kesehatan Nasional,” J. Penelit. Perawat Prof., vol. 6, pp. 2075–2091, 2024, doi: 10.37287/jppp.v6i5.3142.
[3] N. Z. B. Jannah and K. Kusnawi, “Comparison of Naïve Bayes and SVM in Sentiment Analysis of Product Reviews on Marketplaces,” Sinkron, vol. 8, no. 2, pp. 727–733, 2024, doi: 10.33395/sinkron.v8i2.13559.
[4] C. A. Nurhaliza Agustina, R. Novita, Mustakim, and N. E. Rozanda, “The Implementation of TF-IDF and Word2Vec on Booster Vaccine Sentiment Analysis Using Support Vector Machine Algorithm,” Procedia Comput. Sci., vol. 234, pp. 156–163, 2024, doi: 10.1016/j.procs.2024.02.162.
[5] A. Madasu and S. Elango, “Efficient feature selection techniques for sentiment analysis,” Multimed. Tools Appl., vol. 79, no. 9–10, pp. 6313–6335, 2020, doi: 10.1007/s11042-019-08409-z.
[6] F. Rifaldy, Y. Sibaroni, and S. S. Prasetiyowati, “Effectiveness of word2vec and tf-idf in sentiment classification on online investment platforms using support vector machine 1.,” vol. 10, no. 2, pp. 863–874, 2025, doi: https://doi.org/10.29100/jipi.v10i2.6055.
[7] R. D. Kurniawan, “GoPay App Review Sentiment Classification Optimization Using a Combination of Text Representation and Machine Learning,” vol. 6, no. 2, pp. 31–36, 2024, doi: 10.24246/ijiteb.622024.31-36.
[8] J. H. Joloudari, A. Marefat, M. A. Nematollahi, S. S. Oyelere, and S. Hussain, “Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks,” Appl. Sci., vol. 13, no. 6, 2023, doi: 10.3390/app13064006.
[9] T. Ramadhani, P. Hermawan, and A. R. Dzikrillah, “Penerapan Metode Naïve Bayes untuk Analisis Sentimen pada Ulasan Pengguna Aplikasi ChatGPT di Google Play Store,” Technol. Sci., vol. 6, no. 1, pp. 430–439, 2024, doi: 10.47065/bits.v6i1.5400.
[10] N. Maulida, N. Suarna, and W. Prihartono, “Analisis Ulasan Sentimen Aplikasi Mobile Jkn Dengan Algoritma Support Vector Machine Berbasis Particle Swarm Optimization,” JATI (Jurnal Mhs. Tek. Inform., vol. 8, no. 2, pp. 1651–1658, 2024, doi: 10.36040/jati.v8i2.9105.
[11] B. Setiawan, “A Review of Sentiment Analysis Applications in Indonesia Between 2023-2024,” vol. 08, pp. 71–83, 2024, doi: 10.1016/j.procs.2021.01.190.
[12] T. Husain and N. Hidayati, “The Optimize Of Association Rule Method For The Best Book Placement Patterns In Library : A Monthly Trial,” vol. 4, no. 2, pp. 53–59, 2021, doi: 10.31943/teknokom.v4i2.63.
[13] D. E. Cahyani and I. Patasik, “Performance comparison of tf-idf and word2vec models for emotion text classification,” Bull. Electr. Eng. Informatics, vol. 10, no. 5, pp. 2780–2788, 2021, doi: 10.11591/eei.v10i5.3157.
[14] S. Kumar, N. Kaur, Kavita, and A. Joshi, “Tweet Sentiment Analysis using Logistic Regression,” IET Conf. Proc., vol. 2023, no. 11, pp. 332–336, 2023, doi: 10.1049/icp.2023.1801.
[15] S. I. R. Adi, B. Bakkara, K. A. Zega, F. N. Vielita, and N. A. Rakhmawati, “Analisis Sentimen Masyarakat Terhadap Progress Ikn Menggunakan Model Decision Tree,” JIKA (Jurnal Inform., vol. 8, no. 1, p. 57, 2024, doi: 10.31000/jika.v8i1.9803.
[16] A. Mirugwe et al., “Sentiment Analysis of Social Media Data on Ebola Outbreak Using Deep Learning Classifiers,” Life, vol. 14, no. 6, pp. 8–14, 2024, doi: 10.3390/life14060708.
[17] H. T. Duong and T. A. Nguyen-Thi, “A review: preprocessing techniques and data augmentation for sentiment analysis,” Comput. Soc. Networks, vol. 8, no. 1, pp. 1–16, 2021, doi: 10.1186/s40649-020-00080-x.
[18] S. N. Cahyani and G. W. Saraswati, “Implementation of Support Vector Machine Method in Classifying School Library Books With Combination of Tf-Idf and Word2Vec,” J. Tek. Inform., vol. 4, no. 6, pp. 1555–1566, 2023, doi: 10.52436/1.jutif.2023.4.6.1536.
[19] K. Hasanah, “Comparison of Sentiment Analysis Model for Shopee Comments on Google Play Store,” J. Sisfokom (Sistem Inf. dan Komputer), vol. 13, no. 1, pp. 21–30, 2024, doi: 10.32736/sisfokom.v13i1.1916.
[20] A. H. Dani, E. Y. Puspaningrum, and R. Mumpuni, “Studi Performa TF-IDF dan Word2Vec Pada Analisis Sentimen Cyberbullying,” Router J. Tek. Inform. dan Terap., vol. 2, no. 2, pp. 94–106, 2024, [Online]. Available: https://doi.org/10.62951/router.v2i2.76
[21] E. Edwar, I. G. A. N. R. Semadi, M. Samsudin, and I. K. Dharmendra, “Perbandingan Metode Seleksi Fitur Pada Analisis Sentimen (Studi Kasus Opini Pilkada DKI 2017),” INFORMATICS Educ. Prof. J. Informatics, vol. 8, no. 1, p. 11, 2023, doi: 10.51211/itbi.v8i1.2408.
[22] C. A. Ramezan, “Transferability of Recursive Feature Elimination (RFE)-Derived Feature Sets for Support Vector Machine Land Cover Classification,” Remote Sens., vol. 14, no. 24, 2022, doi: 10.3390/rs14246218.
[23] M. B. Hamzah, “Classification of Movie Review Sentiment Analysis Using Chi-Square and Multinomial Naïve Bayes with Adaptive Boosting,” J. Adv. Inf. Syst. Technol., vol. 3, no. 1, pp. 67–74, 2021, doi: 10.15294/jaist.v3i1.49098.
[24] W. Han, H. Chen, and S. Poria, “Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis,” EMNLP 2021 - 2021 Conf. Empir. Methods Nat. Lang. Process. Proc., no. Mi, pp. 9180–9192, 2021, doi: 10.18653/v1/2021.emnlp-main.723.
[25] I. Hendrawan Rifky, E. Utami, and A. Hartanto Dwi, “Analisis Perbandingan Metode Tf-Idf dan Word2vec pada Klasifikasi Teks Sentimen Masyarakat Terhadap Produk Lokal di Indonesia,” Smart Comp Jurnalnya Orang Pint. Komput., vol. 11, no. 3, pp. 497–503, 2022, doi: 10.30591/smartcomp.v11i3.3902.
[26] M. Ali Kawo, G. Muhammad, D. Gabi, and M. Sule Argungu, “A Comparative Study of Some Selected Classifiers on an Imbalanced Dataset for Sentiment Analysis,” Int. J. Innov. Sci. Res. Technol., vol. 9, no. 5, pp. 2826–2832, 2024, doi: 10.38124/ijisrt/ijisrt24may1751.
[27] M. Janaah and A. Nugroho, “Performance of SVM Optimized with PSO as Classification Method for Sentiment Analysis UNNES ’ s Social Media,” pp. 68–80, 2025, doi: 10.20895/infotel.v17i1.1266.
[28] F. M. Anto, L. S. Abimanyu, and T. Herdi, “Penerapan Algoritma Naïve Bayes Dengan Feature Selection Pada Data Penjualan Konstruksi,” J. Ilm. FIFO, vol. 15, no. 2, p. 102, 2024, doi: 10.22441/fifo.2023.v15i2.002.
[29] S. A. Hicks et al., “On evaluation metrics for medical applications of artificial intelligence,” Sci. Rep., vol. 12, no. 1, pp. 1–9, 2022, doi: 10.1038/s41598-022-09954-8.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Rafika Farkhul Awaliyah, Aria Hendrawan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








