Topic Clustering of Student Complaints Based on Semantic Meaning Using the indoBERT and K-Means Models

Authors

  • Gede Herdian Setiawan Institut Teknologi dan Bisnis STIKOM Bali
  • Made Doddy Adi Pranata Institut Teknologi dan Bisnis STIKOM Bali
  • Ida Bagus Alit Arimbawa Institut Teknologi dan Bisnis STIKOM Bali
  • I Wayan Paramarta Giri Institut Teknologi dan Bisnis STIKOM Bali
  • Ni Putu Leona Carisa Dayani Institut Teknologi dan Bisnis STIKOM Bali

DOI:

https://doi.org/10.30871/jaic.v9i4.10080

Keywords:

NLP, IndoBERT, K-Means, Clustering

Abstract

This study applies Natural Language Processing (NLP) technology to extract and cluster information from student complaint text data. The model used is IndoBERT, a variant of BERT (Bidirectional Encoder Representations from Transformers) that has been adapted for the Indonesian language. The main objective of this research is to perform topic clustering based on semantic similarity. The process begins with data collection and cleaning, followed by tokenization and text normalization. Each complaint is transformed into a vector representation through IndoBERT embeddings, which are then used as input for the K-Means clustering algorithm. Evaluation is conducted using various metrics, and the results of the Silhouette Score and Elbow Method indicate that the optimal number of clusters is four. Cluster visualization using the t-distributed Stochastic Neighbor Embedding (t-SNE) method reinforces these findings by displaying four fairly distinct groups of complaints, although one cluster appears dispersed and less well-defined, indicating possible topic overlap. The quality of topics within each cluster is evaluated using the Topic Coherence (c_v) metric, where Cluster 3 achieved the highest score of 0.7084. The topics in this cluster highlight critical issues such as campus facilities, lecturer quality, and information delivery systems. Overall, the four resulting clusters reflect central themes: Facilities, Expectations or Impressions, Services, and Academic Lectures. These results are expected to serve as a reference for institutions in formulating service improvement policies based on student complaint analysis.

Downloads

Download data is not yet available.

References

[1] G. H. Setiawan, I. Made, B. Adnyana, G. Rai, A. Sugiartha, and K. Budiarta, “Ekstraksi Topik Pada Aduan Mahasiswa Dengan Pendeketan Model Latent Dirichlet Allocation (LDA),” 2024.

[2] J. Huang and X. Zhu, “Deep Semantic Clustering by Partition Confidence Maximisation,” 2020.

[3] L. Geni, E. Yulianti, and D. I. Sensuse, “Sentiment Analysis of Tweets Before the 2024 Elections in Indonesia Using IndoBERT Language Models,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 9, no. 3, pp. 746–757, 2023, doi: 10.26555/jiteki.v9i3.26490.

[4] D. A. Rahman, R. B. Waskitho, M. Fajrul, A. U. Nuha, and N. A. Rakhmawati, “Klasterisasi Topik Konten Channel Youtube Gaming Indonesia Menggunakan Latent Dirichlet Allocation,” 2021.

[5] H. Tommy Argo Simanjuntak, P. Ephraim Prabowo Silaban, J. Koko Sarasi Manurung, and V. Handayani Sormin, “Klasterisasi Berita Bahasa Indonesia Dengan Menggunakan K-Means Dan Word Embedding,” vol. 10, no. 3, pp. 641–652, 2023, doi: 10.25126/jtiik.2023106468.

[6] Z. Vladimir, D. Alamsyah, and W. Widhiarso, “Klasterisasi Topik Skripsi Informatika Dengan Metode DBSCAN,” Jurnal Algoritme, vol. 3, no. 1, 2022.

[7] R. Siringoringo, “Text Mining dan Klasterisasi Sentimen Pada Ulasan Produk Toko Online,” 2019.

[8] M. Riduwan, C. Fatichah, and A. Yuniarti, “Klasterisasi Dokumen Menggunakan Weighted K-Means Berdasarkan Relevansi Topik,” 2019.

[9] S. W. Harjono et al., “Klasterisasi Tingkat Penjualan pada Startup Panak.id dengan Algoritma K-Means,” Jurnal Ilmiah Teknologi Informasi Asia, vol. 17, no. 1, 2023.

[10] M. R. Arief, D. O. Siahaan, and I. Arieshanti, “Klasterisasi Teks Menggunakan Metode Max-Max Roughness (Mmr) Dengan Pengayaan Similaritas Kata,” 2010.

[11] S. M. Isa, G. Nico, and M. Permana, “Indobert For Indonesian Fake News Detection,” ICIC Express Letters, vol. 16, no. 3, pp. 289–297, Mar. 2022, doi: 10.24507/icicel.16.03.289.

[12] S. Saadah, Kaenova Mahendra Auditama, Ananda Affan Fattahila, Fendi Irfan Amorokhman, Annisa Aditsania, and Aniq Atiqi Rohmawati, “Implementation of BERT, IndoBERT, and CNN-LSTM in Classifying Public Opinion about COVID-19 Vaccine in Indonesia,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 4, pp. 648–655, Aug. 2022, doi: 10.29207/resti.v6i4.4215.

[13] H. Jayadianti, W. Kaswidjanti, A. T. Utomo, S. Saifullah, F. A. Dwiyanto, and R. Drezewski, “Sentiment analysis of Indonesian reviews using fine-tuning IndoBERT and R-CNN,” ILKOM Jurnal Ilmiah, vol. 14, no. 3, pp. 348–354, Dec. 2022, doi: 10.33096/ilkom.v14i3.1505.348-354.

[14] A. Punhani, N. Faujdar, K. K. Mishra, and M. Subramanian, “Binning-Based Silhouette Approach to Find the Optimal Cluster Using K-Means,” IEEE Access, vol. 10, pp. 115025–115032, 2022, doi: 10.1109/ACCESS.2022.3215568.

[15] C. H. Miranda, G. Sanchez-Torres, and D. Salcedo, “Exploring the Evolution of Sentiment in Spanish Pandemic Tweets: A Data Analysis Based on a Fine-Tuned BERT Architecture,” Data (Basel), vol. 8, no. 6, Jun. 2023, doi: 10.3390/data8060096.

[16] L. Geni, E. Yulianti, and D. I. Sensuse, “Sentiment Analysis of Tweets Before the 2024 Elections in Indonesia Using IndoBERT Language Models,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 9, no. 3, pp. 746–757, 2023, doi: 10.26555/jiteki.v9i3.26490.

[17] A. Alfajri, D. Richasdy, and M. A. Bijaksana, “Topic Modelling Using Non-Negative Matrix Factorization (NMF) for Telkom University Entry Selection from Instagram Comments,” Journal of Computer System and Informatics (JoSYC), vol. 3, no. 4, pp. 485–492, Sep. 2022, doi: 10.47065/josyc.v3i4.2212.

Downloads

Published

2025-08-08

How to Cite

[1]
G. H. Setiawan, M. D. A. Pranata, I. B. A. Arimbawa, I. W. P. Giri, and N. P. L. Carisa Dayani, “Topic Clustering of Student Complaints Based on Semantic Meaning Using the indoBERT and K-Means Models”, JAIC, vol. 9, no. 4, pp. 1715–1721, Aug. 2025.

Issue

Section

Articles

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.