Topic Clustering of Student Complaints Based on Semantic Meaning Using the indoBERT and K-Means Models
DOI:
https://doi.org/10.30871/jaic.v9i4.10080Keywords:
NLP, IndoBERT, K-Means, ClusteringAbstract
This study applies Natural Language Processing (NLP) technology to extract and cluster information from student complaint text data. The model used is IndoBERT, a variant of BERT (Bidirectional Encoder Representations from Transformers) that has been adapted for the Indonesian language. The main objective of this research is to perform topic clustering based on semantic similarity. The process begins with data collection and cleaning, followed by tokenization and text normalization. Each complaint is transformed into a vector representation through IndoBERT embeddings, which are then used as input for the K-Means clustering algorithm. Evaluation is conducted using various metrics, and the results of the Silhouette Score and Elbow Method indicate that the optimal number of clusters is four. Cluster visualization using the t-distributed Stochastic Neighbor Embedding (t-SNE) method reinforces these findings by displaying four fairly distinct groups of complaints, although one cluster appears dispersed and less well-defined, indicating possible topic overlap. The quality of topics within each cluster is evaluated using the Topic Coherence (c_v) metric, where Cluster 3 achieved the highest score of 0.7084. The topics in this cluster highlight critical issues such as campus facilities, lecturer quality, and information delivery systems. Overall, the four resulting clusters reflect central themes: Facilities, Expectations or Impressions, Services, and Academic Lectures. These results are expected to serve as a reference for institutions in formulating service improvement policies based on student complaint analysis.
Downloads
References
[1] G. H. Setiawan, I. Made, B. Adnyana, G. Rai, A. Sugiartha, and K. Budiarta, “Ekstraksi Topik Pada Aduan Mahasiswa Dengan Pendeketan Model Latent Dirichlet Allocation (LDA),” 2024.
[2] J. Huang and X. Zhu, “Deep Semantic Clustering by Partition Confidence Maximisation,” 2020.
[3] L. Geni, E. Yulianti, and D. I. Sensuse, “Sentiment Analysis of Tweets Before the 2024 Elections in Indonesia Using IndoBERT Language Models,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 9, no. 3, pp. 746–757, 2023, doi: 10.26555/jiteki.v9i3.26490.
[4] D. A. Rahman, R. B. Waskitho, M. Fajrul, A. U. Nuha, and N. A. Rakhmawati, “Klasterisasi Topik Konten Channel Youtube Gaming Indonesia Menggunakan Latent Dirichlet Allocation,” 2021.
[5] H. Tommy Argo Simanjuntak, P. Ephraim Prabowo Silaban, J. Koko Sarasi Manurung, and V. Handayani Sormin, “Klasterisasi Berita Bahasa Indonesia Dengan Menggunakan K-Means Dan Word Embedding,” vol. 10, no. 3, pp. 641–652, 2023, doi: 10.25126/jtiik.2023106468.
[6] Z. Vladimir, D. Alamsyah, and W. Widhiarso, “Klasterisasi Topik Skripsi Informatika Dengan Metode DBSCAN,” Jurnal Algoritme, vol. 3, no. 1, 2022.
[7] R. Siringoringo, “Text Mining dan Klasterisasi Sentimen Pada Ulasan Produk Toko Online,” 2019.
[8] M. Riduwan, C. Fatichah, and A. Yuniarti, “Klasterisasi Dokumen Menggunakan Weighted K-Means Berdasarkan Relevansi Topik,” 2019.
[9] S. W. Harjono et al., “Klasterisasi Tingkat Penjualan pada Startup Panak.id dengan Algoritma K-Means,” Jurnal Ilmiah Teknologi Informasi Asia, vol. 17, no. 1, 2023.
[10] M. R. Arief, D. O. Siahaan, and I. Arieshanti, “Klasterisasi Teks Menggunakan Metode Max-Max Roughness (Mmr) Dengan Pengayaan Similaritas Kata,” 2010.
[11] S. M. Isa, G. Nico, and M. Permana, “Indobert For Indonesian Fake News Detection,” ICIC Express Letters, vol. 16, no. 3, pp. 289–297, Mar. 2022, doi: 10.24507/icicel.16.03.289.
[12] S. Saadah, Kaenova Mahendra Auditama, Ananda Affan Fattahila, Fendi Irfan Amorokhman, Annisa Aditsania, and Aniq Atiqi Rohmawati, “Implementation of BERT, IndoBERT, and CNN-LSTM in Classifying Public Opinion about COVID-19 Vaccine in Indonesia,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 4, pp. 648–655, Aug. 2022, doi: 10.29207/resti.v6i4.4215.
[13] H. Jayadianti, W. Kaswidjanti, A. T. Utomo, S. Saifullah, F. A. Dwiyanto, and R. Drezewski, “Sentiment analysis of Indonesian reviews using fine-tuning IndoBERT and R-CNN,” ILKOM Jurnal Ilmiah, vol. 14, no. 3, pp. 348–354, Dec. 2022, doi: 10.33096/ilkom.v14i3.1505.348-354.
[14] A. Punhani, N. Faujdar, K. K. Mishra, and M. Subramanian, “Binning-Based Silhouette Approach to Find the Optimal Cluster Using K-Means,” IEEE Access, vol. 10, pp. 115025–115032, 2022, doi: 10.1109/ACCESS.2022.3215568.
[15] C. H. Miranda, G. Sanchez-Torres, and D. Salcedo, “Exploring the Evolution of Sentiment in Spanish Pandemic Tweets: A Data Analysis Based on a Fine-Tuned BERT Architecture,” Data (Basel), vol. 8, no. 6, Jun. 2023, doi: 10.3390/data8060096.
[16] L. Geni, E. Yulianti, and D. I. Sensuse, “Sentiment Analysis of Tweets Before the 2024 Elections in Indonesia Using IndoBERT Language Models,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 9, no. 3, pp. 746–757, 2023, doi: 10.26555/jiteki.v9i3.26490.
[17] A. Alfajri, D. Richasdy, and M. A. Bijaksana, “Topic Modelling Using Non-Negative Matrix Factorization (NMF) for Telkom University Entry Selection from Instagram Comments,” Journal of Computer System and Informatics (JoSYC), vol. 3, no. 4, pp. 485–492, Sep. 2022, doi: 10.47065/josyc.v3i4.2212.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Gede Herdian Setiawan, Made Doddy Adi Pranata, Ida Bagus Alit Arimbawa, I Wayan Paramarta Giri, Ni Putu Leona Carisa Dayani

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








