Application of Data Mining with the K-Means Clustering Method and Davies Bouldin Index for Grouping IMDB Movies

  • Ilham Firman Ashari Institut Teknologi Sumatera
  • Romantika Banjarnahor Institut Teknologi Sumatera
  • Dede Rodhatul Farida Institut Teknologi Sumatera
  • Sicilia Putri Aisyah Institut Teknologi Sumatera
  • Anastasia Puteri Dewi Institut Teknologi Sumatera
  • Nuril Humaya Institut Teknologi Sumatera
Keywords: Data Mining, IMDb, Clustering, K-means


Along with the development of technology, the film industry continues to increase, this can be seen from the number of films that appear both in cinemas and tv shows. The Internet Movie Database (IMDb) is a website that provides information about films from around the world, including the people involved in the films. Information contained on IMDB such as actor/actress, director, writer, to the soundtrack used. In addition, IMDb is the most popular and trusted source of information for movies, TV, and other celebrity content. In this case, the researcher will conduct research on the film with what title is the most popular among the public by looking at some of the parameters contained in IMDB such as the number on the rating, score, certificate, and votes obtained from the audience. The data used comes from the website. The data mining method used is the K-Means clustering method. To find out the optimal cluster value, the Davies Bouldin index is used. The K-Means algorithm will group the data based on the centroid. The parameters used for clustering are runtime, IMDB rating, meta score, number of votes, and gross. The results of the study obtained that the average calculation of the highest attributes was 48.74 and the number of clusters formed was 4 clusters. The results of the evaluation using the confusion matrix obtained an accuracy value of 100%.


Download data is not yet available.


H. Ardiyanti, “Perfilman Indonesia: Perkembangan dan Kebijakan, Sebuah Telaah dari Perspektif Industri Budaya,” Kajian, vol. 22, no. 2, pp. 163–179, 2017, [Online]. Available:

G. N. H. Pratama, “Sistem Rekomendasi Film Menggunakan Metode Content Based Filtering,” vol. 5, no. 6, 2019, [Online]. Available:

J. Fang and W. Xiong, “Impact of digital technology and internet to film industry,” IOP Conf. Ser. Mater. Sci. Eng., vol. 768, no. 7, 2020, doi: 10.1088/1757-899X/768/7/072112.

B. G. Sudarsono, M. I. Leo, A. Santoso, and F. Hendrawan, “Analisis Data Mining Data Netflix Menggunakan Aplikasi Rapid Miner,” JBASE - J. Bus. Audit Inf. Syst., vol. 4, no. 1, pp. 13–21, 2021, doi: 10.30813/jbase.v4i1.2729.

I. F. Ashari, “Implementation of Cyber-Physical-Social System Based on Service Oriented Architecture in Smart Tourism,” J. Appl. Informatics Comput., vol. 4, no. 1, pp. 66–73, 2020, doi: 10.30871/jaic.v4i1.2077.

N. Wakhidah, “Clustering Menggunakan K-Means Algorithm,” J. Transform., vol. 8, no. 1, p. 33, 2010, doi: 10.26623/transformatika.v8i1.45.

R. T. Vulandari, W. L. Y. Saptomo, and D. W. Aditama, “Application of K-Means Clustering in Mapping of Central Java Crime Area,” Indones. J. Appl. Stat., vol. 3, no. 1, p. 38, 2020, doi: 10.13057/ijas.v3i1.40984.

Z. I. Alfianti, “Pengelompokkan Wilayah penyebaran COVID-19 di Kabupaten Karawang Menggunakan Algoritma K-Means,” J. Ilm. Inform. Komput., vol. 26, no. 2, pp. 111–122, 2020.

S. Handoko, F. Fauziah, and E. T. E. Handayani, “Implementasi Data Mining Untuk Menentukan Tingkat Penjualan Paket Data Telkomsel Menggunakan Metode K-Means Clustering,” J. Ilm. Teknol. dan Rekayasa, vol. 25, no. 1, pp. 76–88, 2020, doi: 10.35760/tr.2020.v25i1.2677.

D. Jollyta, S. Efendi, M. Zarlis, and H. Mawengkang, “Optimasi Cluster Pada Data Stunting: Teknik Evaluasi Cluster Sum of Square Error dan Davies Bouldin Index,” Pros. Semin. Nas. Ris. Inf. Sci., vol. 1, no. September, p. 918, 2019, doi: 10.30645/senaris.v1i0.100.

E. Muningsih, I. Maryani, and V. R. Handayani, “Penerapan Metode K-Means dan Optimasi Jumlah Cluster dengan Index Davies Bouldin untuk Clustering Propinsi Berdasarkan Potensi Desa,” Evolusi J. Sains dan Manaj., vol. 9, no. 1, pp. 95–100, 2021.

N. E. Saputra, K. D. Tania, and R. I. Heroza, “Penerapan Knowledge Management System (KMS) Menggunakan Teknik Knowledge Data Discovery (KDD) Pada PT PLN (Persero) WS2JB Rayon Kayu Agung,” J. Sist. Inf., vol. 8, no. 2, pp. 1038–1055, 2016.

M. R. Muttaqin and M. Defriani, “Algoritma K-Means untuk Pengelompokan Topik Skripsi Mahasiswa,” Ilk. J. Ilm., vol. 12, no. 2, pp. 121–129, 2020, doi: 10.33096/ilkom.v12i2.542.121-129.

Asroni and R. Adrian, “Penerapan Metode K-Means Untuk Clustering Mahasiswa Berdasarkan Nilai Akademik Dengan Weka Interface Studi Kasus Pada Jurusan Teknik Informatika UMM Magelang,” J. Ilm. Semesta Tek., vol. 18, no. 1, pp. 76–82, 2015.

A. Almayda and S. Saepudin, “Penerapan Data Mining K-Means Clustering Untuk Menglelompokkan Berbagai Jenis Merek Smartphone,” in SISMATIK (Seminar Nasional Sistem Informasi dan Manajemen Informatika), 2021, pp. 241–249.

N. Dwitri, J. A. Tampubolon, S. Prayoga, F. Ilmi Zer, and D. Hartama, “Penerapan Algoritma K-Means Dalam Menentukan Tingkat Penyebaran Pandemi COVID-19 di Indonesia,” Jti (Jurnal Teknol. Informasi), vol. 4, no. 1, pp. 101–105, 2020.

I. F. Ashari, “The Evaluation of Image Messages in MP3 Audio Steganography Using Modified Low-Bit Encoding,” Telematika, vol. 15, 2021.

How to Cite
I. Ashari, R. Banjarnahor, D. Farida, S. Aisyah, A. Dewi, and N. Humaya, “Application of Data Mining with the K-Means Clustering Method and Davies Bouldin Index for Grouping IMDB Movies”, JAIC, vol. 6, no. 1, pp. 07-15, Jul. 2022.