Application of Data Mining with the K-Means Clustering Method and Davies Bouldin Index for Grouping IMDB Movies
Along with the development of technology, the film industry continues to increase, this can be seen from the number of films that appear both in cinemas and tv shows. The Internet Movie Database (IMDb) is a website that provides information about films from around the world, including the people involved in the films. Information contained on IMDB such as actor/actress, director, writer, to the soundtrack used. In addition, IMDb is the most popular and trusted source of information for movies, TV, and other celebrity content. In this case, the researcher will conduct research on the film with what title is the most popular among the public by looking at some of the parameters contained in IMDB such as the number on the rating, score, certificate, and votes obtained from the audience. The data used comes from the Kaggle.com website. The data mining method used is the K-Means clustering method. To find out the optimal cluster value, the Davies Bouldin index is used. The K-Means algorithm will group the data based on the centroid. The parameters used for clustering are runtime, IMDB rating, meta score, number of votes, and gross. The results of the study obtained that the average calculation of the highest attributes was 48.74 and the number of clusters formed was 4 clusters. The results of the evaluation using the confusion matrix obtained an accuracy value of 100%.
H. Ardiyanti, “Perfilman Indonesia: Perkembangan dan Kebijakan, Sebuah Telaah dari Perspektif Industri Budaya,” Kajian, vol. 22, no. 2, pp. 163–179, 2017, [Online]. Available: http://jurnal.dpr.go.id/index.php/kajian/article/view/1521/789.
G. N. H. Pratama, “Sistem Rekomendasi Film Menggunakan Metode Content Based Filtering,” vol. 5, no. 6, 2019, [Online]. Available: http://e-journal.uajy.ac.id/20600/.
J. Fang and W. Xiong, “Impact of digital technology and internet to film industry,” IOP Conf. Ser. Mater. Sci. Eng., vol. 768, no. 7, 2020, doi: 10.1088/1757-899X/768/7/072112.
B. G. Sudarsono, M. I. Leo, A. Santoso, and F. Hendrawan, “Analisis Data Mining Data Netflix Menggunakan Aplikasi Rapid Miner,” JBASE - J. Bus. Audit Inf. Syst., vol. 4, no. 1, pp. 13–21, 2021, doi: 10.30813/jbase.v4i1.2729.
I. F. Ashari, “Implementation of Cyber-Physical-Social System Based on Service Oriented Architecture in Smart Tourism,” J. Appl. Informatics Comput., vol. 4, no. 1, pp. 66–73, 2020, doi: 10.30871/jaic.v4i1.2077.
N. Wakhidah, “Clustering Menggunakan K-Means Algorithm,” J. Transform., vol. 8, no. 1, p. 33, 2010, doi: 10.26623/transformatika.v8i1.45.
R. T. Vulandari, W. L. Y. Saptomo, and D. W. Aditama, “Application of K-Means Clustering in Mapping of Central Java Crime Area,” Indones. J. Appl. Stat., vol. 3, no. 1, p. 38, 2020, doi: 10.13057/ijas.v3i1.40984.
Z. I. Alfianti, “Pengelompokkan Wilayah penyebaran COVID-19 di Kabupaten Karawang Menggunakan Algoritma K-Means,” J. Ilm. Inform. Komput., vol. 26, no. 2, pp. 111–122, 2020.
S. Handoko, F. Fauziah, and E. T. E. Handayani, “Implementasi Data Mining Untuk Menentukan Tingkat Penjualan Paket Data Telkomsel Menggunakan Metode K-Means Clustering,” J. Ilm. Teknol. dan Rekayasa, vol. 25, no. 1, pp. 76–88, 2020, doi: 10.35760/tr.2020.v25i1.2677.
D. Jollyta, S. Efendi, M. Zarlis, and H. Mawengkang, “Optimasi Cluster Pada Data Stunting: Teknik Evaluasi Cluster Sum of Square Error dan Davies Bouldin Index,” Pros. Semin. Nas. Ris. Inf. Sci., vol. 1, no. September, p. 918, 2019, doi: 10.30645/senaris.v1i0.100.
E. Muningsih, I. Maryani, and V. R. Handayani, “Penerapan Metode K-Means dan Optimasi Jumlah Cluster dengan Index Davies Bouldin untuk Clustering Propinsi Berdasarkan Potensi Desa,” Evolusi J. Sains dan Manaj., vol. 9, no. 1, pp. 95–100, 2021.
N. E. Saputra, K. D. Tania, and R. I. Heroza, “Penerapan Knowledge Management System (KMS) Menggunakan Teknik Knowledge Data Discovery (KDD) Pada PT PLN (Persero) WS2JB Rayon Kayu Agung,” J. Sist. Inf., vol. 8, no. 2, pp. 1038–1055, 2016.
M. R. Muttaqin and M. Defriani, “Algoritma K-Means untuk Pengelompokan Topik Skripsi Mahasiswa,” Ilk. J. Ilm., vol. 12, no. 2, pp. 121–129, 2020, doi: 10.33096/ilkom.v12i2.542.121-129.
Asroni and R. Adrian, “Penerapan Metode K-Means Untuk Clustering Mahasiswa Berdasarkan Nilai Akademik Dengan Weka Interface Studi Kasus Pada Jurusan Teknik Informatika UMM Magelang,” J. Ilm. Semesta Tek., vol. 18, no. 1, pp. 76–82, 2015.
A. Almayda and S. Saepudin, “Penerapan Data Mining K-Means Clustering Untuk Menglelompokkan Berbagai Jenis Merek Smartphone,” in SISMATIK (Seminar Nasional Sistem Informasi dan Manajemen Informatika), 2021, pp. 241–249.
N. Dwitri, J. A. Tampubolon, S. Prayoga, F. Ilmi Zer, and D. Hartama, “Penerapan Algoritma K-Means Dalam Menentukan Tingkat Penyebaran Pandemi COVID-19 di Indonesia,” Jti (Jurnal Teknol. Informasi), vol. 4, no. 1, pp. 101–105, 2020.
I. F. Ashari, “The Evaluation of Image Messages in MP3 Audio Steganography Using Modified Low-Bit Encoding,” Telematika, vol. 15, 2021.
Copyright (c) 2022 Ilham Firman Ashari, Romantika Banjarnahor, Dede Rodhatul Farida, Sicilia Putri Aisyah, Anastasia Puteri Dewi, Nuril Humaya
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).