K-Means Clustering with KNN and Mean Imputation on CPU Benchmark Compilation Data
Abstract
In the rapidly evolving digital age, data is becoming a valuable source for decision-making and analysis. Clustering, as an important technique in data analysis, has a key role in organizing and understanding complex datasets. One of the effective clustering algorithms is k-means. However, this algorithm is prone to the problem of missing values, which can significantly affect the quality of the resulting clusters. To overcome this challenge, imputation methods are used, including mean imputation and K-Nearest Neighbor (KNN) imputation. This study aims to analyze the impact of imputation methods on CPU Benchmark Compilation clustering results. Evaluation of the clustering results using the silhouette coefficient showed that clustering with mean imputation achieved a score of 0.782, while with KNN imputation it achieved a score of 0.777. In addition, the cluster interpretation results show that the KNN method produces more information that is easier for users to understand. This research provides valuable insights into the effectiveness of imputation methods in improving the quality of data clustering results in assisting CPU selection decisions on CPU Benchmark Compilation data.
Downloads
References
M. R. Nahjan, N. Heryana, and A. Voutama, “Implementasi Rapidminer Dengan Metode Clustering K-Means Untuk Analisa Penjualan Pada Toko Oj Cell,” J. Mhs. Tek. Inform., vol. 7, no. 1, pp. 1–4, 2023.
R. Hasibuan Budiansyah, H. Hafizah, and R. Mahyuni, “Penerapan Data Mining Clustering Dengan Menggunakan Algoritma K-Means Pada Data Nasabah Kredit Bermasalah PT. BPR Milala,” J-SISKO TECH (Jurnal Teknol. Sist. Inf. dan Sist. Komput. TGD), vol. 5, no. 1, p. 7, 2022, doi: 10.53513/jsk.v5i1.4767.
J. Hutagalung, “Pemetaan Siswa Kelas Unggulan Menggunakan Algoritma K-Means Clustering,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 9, no. 1, pp. 606–620, 2022, doi: 10.35957/jatisi.v9i1.1516.
J. Han, Jiawei; Kamber, Micheline; Pei, Data Mining: Concepts and Techniques, 3rd ed. Waltham: Morgan Kaufmann, 2011.
T. Raudhatunnisa and N. Wilantika, “Performance Comparison of Hot-Deck Imputation, K-Nearest Neighbor Imputation, and Predictive Mean Matching in Missing Value Handling, Case Study: March 2019 SUSENAS Kor Dataset,” Proc. Int. Conf. Data Sci. Off. Stat., vol. 2021, no. 1, pp. 753–770, 2022, doi: 10.34123/icdsos.v2021i1.93.
D. M. P. Murti, U. Pujianto, A. P. Wibawa, and M. I. Akbar, “K-Nearest Neighbor (K-NN) based Missing Data Imputation,” Proceeding - 2019 5th Int. Conf. Sci. Inf. Technol. Embrac. Ind. 4.0 Towar. Innov. Cyber Phys. Syst. ICSITech 2019, pp. 83–88, 2019, doi: 10.1109/ICSITech46713.2019.8987530.
A. Fadlil, Herman, and D. Praseptian M, “K Nearest Neighbor Imputation Performance on Missing Value Data Graduate User Satisfaction,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 4, pp. 570–576, 2022, doi: 10.29207/resti.v6i4.4173.
T. Andriyanto, M. Cs, R. Indriati, and M. Kom, “Rekomendasi Spesifikasi Processor Menggunakan Analisa K-Means Cluster,” Simki-Techsain, vol. 2, 2018, [Online]. Available: http://simki.unpkediri.ac.id/detail/14.1.03.03.0143
E. Sartika, “Analisis Metode K Nearest Neighbor Imputation (KNNI) untuk Mengatasi Data Hilang Pada Estimasi Data Survey,” J. TEDC, vol. 12, no. 3, pp. 219–227, 2018.
T. Hendrawati, “Kajian Metode Imputasi dalam Menangani Missing Data,” Pros. Semin. Nas. Mat. dan Pendidik. Mat. UMS, pp. 637–642, 2015, [Online]. Available: http://hdl.handle.net/11617/5804
C. Curley, R. M. Krause, R. Feiock, and C. V. Hawkins, “Dealing with Missing Data: A Comparative Exploration of Approaches Using the Integrated City Sustainability Database,” Urban Aff. Rev., vol. 55, no. 2, pp. 591–615, 2019, doi: 10.1177/1078087417726394.
A. Ilham, “Hybrid Metode Bootstrap Dan Teknik Imputasi Pada Metode C4-5 Untuk Prediksi Penyakit Ginjal Kronis,” Statistika, vol. 8, no. 1, pp. 43–51, 2020.
R. G. Minakshi Vohra, “Missing Value Imputation in Multi Attribute Data Set,” Int. J. Comput. Sci. Inf. Technol., vol. 5, no. 4, pp. 5315–5321, 2014.
X. Liu, X. Lai, and L. Zhang, “A hierarchical missing value imputation method by correlation-based K-nearest neighbors,” Adv. Intell. Syst. Comput., vol. 1037, pp. 486–496, 2020, doi: 10.1007/978-3-030-29516-5_38.
S. Martha and E. Sulistianingsih INTISARI, “K Nearest Neighbor Dalam Imputasi Missing Data,” Bul. Ilm. Math. Stat. dan Ter., vol. 07, no. 1, pp. 9–14, 2018, [Online]. Available: http://archive.ics.uci.edu/ml/datas/Iris.
A. Supriyadi, A. Triayudi, and I. D. Sholihati, “Perbandingan Algoritma K-Means Dengan K-Medoids Pada Pengelompokan Armada Kendaraan Truk Berdasarkan Produktivitas,” JIPI (Jurnal Ilm. Penelit. dan Pembelajaran Inform., vol. 6, no. 2, pp. 229–240, 2021, doi: 10.29100/jipi.v6i2.2008.
D. A. Nasution, H. H. Khotimah, and N. Chamidah, “Perbandingan Normalisasi Data untuk Klasifikasi Wine Menggunakan Algoritma K-NN,” Comput. Eng. Sci. Syst. J., vol. 4, no. 1, p. 78, 2019, doi: 10.24114/cess.v4i1.11458.
L. Suriani, “Pengelompokan Data Kriminal Pada Poldasu Menentukan Pola Daerah Rawan Tindak Kriminal Menggunakan Data Mining Algoritma K-Means Clustering,” J. Sist. Komput. dan Inform., vol. 1, no. 2, p. 151, 2020, doi: 10.30865/json.v1i2.1955.
A. M. Sikana and A. W. Wijayanto, “Analisis Perbandingan Pengelompokan Indeks Pembangunan Manusia Indonesia Tahun 2019 dengan Metode Partitioning dan Hierarchical Clustering,” J. Ilmu Komput., vol. 14, no. 2, p. 66, 2021, doi: 10.24843/jik.2021.v14.i02.p01.
F. Yunita, “Penerapan Data Mining Menggunkan Algoritma K-Means Clustring Pada Penerimaan Mahasiswa Baru,” Sistemasi, vol. 7, no. 3, p. 238, 2018, doi: 10.32520/stmsi.v7i3.388.
S. Aulia, “Klasterisasi Pola Penjualan Pestisida Menggunakan Metode K-Means Clustering (Studi Kasus Di Toko Juanda Tani Kecamatan Hutabayu Raja),” Djtechno J. Teknol. Inf., vol. 1, no. 1, pp. 1–5, 2021, doi: 10.46576/djtechno.v1i1.964.
I. Wahyudi, M. B. Sulthan, and L. Suhartini, “Analisa Penentuan Cluster Terbaik Pada Metode K-Means Menggunakan Elbow Terhadap Sentra Industri Produksi Di Pamekasan,” J. Apl. Teknol. Inf. dan Manaj., vol. 2, no. 2, pp. 72–81, 2021, doi: 10.31102/jatim.v2i2.1274.
S. Asmiatun, “Penerapan Metode K-Medoids Untuk Pengelompokkan Kondisi Jalan Di Kota Semarang,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 6, no. 2, pp. 171–180, 2019, doi: 10.35957/jatisi.v6i2.193.
Copyright (c) 2023 Rofiq Muhammad Syauqi, Puspita Nurul Sabrina, Irma Santikarama
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).