K-Means Clustering with KNN and Mean Imputation on CPU Benchmark Compilation Data

  • Rofiq Muhammad Syauqi Universitas Jenderal Achmad Yani
  • Puspita Nurul Sabrina Universitas Jenderal Achmad Yani
  • Irma Santikarama Universitas Jenderal Achmad Yani
Keywords: Clustering, K-Means, KNN Imputation, Mean Imputation, Silhouette coefficient

Abstract

In the rapidly evolving digital age, data is becoming a valuable source for decision-making and analysis. Clustering, as an important technique in data analysis, has a key role in organizing and understanding complex datasets. One of the effective clustering algorithms is k-means. However, this algorithm is prone to the problem of missing values, which can significantly affect the quality of the resulting clusters. To overcome this challenge, imputation methods are used, including mean imputation and K-Nearest Neighbor (KNN) imputation. This study aims to analyze the impact of imputation methods on CPU Benchmark Compilation clustering results. Evaluation of the clustering results using the silhouette coefficient showed that clustering with mean imputation achieved a score of 0.782, while with KNN imputation it achieved a score of 0.777. In addition, the cluster interpretation results show that the KNN method produces more information that is easier for users to understand. This research provides valuable insights into the effectiveness of imputation methods in improving the quality of data clustering results in assisting CPU selection decisions on CPU Benchmark Compilation data.

Downloads

Download data is not yet available.

References

M. R. Nahjan, N. Heryana, and A. Voutama, “Implementasi Rapidminer Dengan Metode Clustering K-Means Untuk Analisa Penjualan Pada Toko Oj Cell,” J. Mhs. Tek. Inform., vol. 7, no. 1, pp. 1–4, 2023.

R. Hasibuan Budiansyah, H. Hafizah, and R. Mahyuni, “Penerapan Data Mining Clustering Dengan Menggunakan Algoritma K-Means Pada Data Nasabah Kredit Bermasalah PT. BPR Milala,” J-SISKO TECH (Jurnal Teknol. Sist. Inf. dan Sist. Komput. TGD), vol. 5, no. 1, p. 7, 2022, doi: 10.53513/jsk.v5i1.4767.

J. Hutagalung, “Pemetaan Siswa Kelas Unggulan Menggunakan Algoritma K-Means Clustering,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 9, no. 1, pp. 606–620, 2022, doi: 10.35957/jatisi.v9i1.1516.

J. Han, Jiawei; Kamber, Micheline; Pei, Data Mining: Concepts and Techniques, 3rd ed. Waltham: Morgan Kaufmann, 2011.

T. Raudhatunnisa and N. Wilantika, “Performance Comparison of Hot-Deck Imputation, K-Nearest Neighbor Imputation, and Predictive Mean Matching in Missing Value Handling, Case Study: March 2019 SUSENAS Kor Dataset,” Proc. Int. Conf. Data Sci. Off. Stat., vol. 2021, no. 1, pp. 753–770, 2022, doi: 10.34123/icdsos.v2021i1.93.

D. M. P. Murti, U. Pujianto, A. P. Wibawa, and M. I. Akbar, “K-Nearest Neighbor (K-NN) based Missing Data Imputation,” Proceeding - 2019 5th Int. Conf. Sci. Inf. Technol. Embrac. Ind. 4.0 Towar. Innov. Cyber Phys. Syst. ICSITech 2019, pp. 83–88, 2019, doi: 10.1109/ICSITech46713.2019.8987530.

A. Fadlil, Herman, and D. Praseptian M, “K Nearest Neighbor Imputation Performance on Missing Value Data Graduate User Satisfaction,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 4, pp. 570–576, 2022, doi: 10.29207/resti.v6i4.4173.

T. Andriyanto, M. Cs, R. Indriati, and M. Kom, “Rekomendasi Spesifikasi Processor Menggunakan Analisa K-Means Cluster,” Simki-Techsain, vol. 2, 2018, [Online]. Available: http://simki.unpkediri.ac.id/detail/14.1.03.03.0143

E. Sartika, “Analisis Metode K Nearest Neighbor Imputation (KNNI) untuk Mengatasi Data Hilang Pada Estimasi Data Survey,” J. TEDC, vol. 12, no. 3, pp. 219–227, 2018.

T. Hendrawati, “Kajian Metode Imputasi dalam Menangani Missing Data,” Pros. Semin. Nas. Mat. dan Pendidik. Mat. UMS, pp. 637–642, 2015, [Online]. Available: http://hdl.handle.net/11617/5804

C. Curley, R. M. Krause, R. Feiock, and C. V. Hawkins, “Dealing with Missing Data: A Comparative Exploration of Approaches Using the Integrated City Sustainability Database,” Urban Aff. Rev., vol. 55, no. 2, pp. 591–615, 2019, doi: 10.1177/1078087417726394.

A. Ilham, “Hybrid Metode Bootstrap Dan Teknik Imputasi Pada Metode C4-5 Untuk Prediksi Penyakit Ginjal Kronis,” Statistika, vol. 8, no. 1, pp. 43–51, 2020.

R. G. Minakshi Vohra, “Missing Value Imputation in Multi Attribute Data Set,” Int. J. Comput. Sci. Inf. Technol., vol. 5, no. 4, pp. 5315–5321, 2014.

X. Liu, X. Lai, and L. Zhang, “A hierarchical missing value imputation method by correlation-based K-nearest neighbors,” Adv. Intell. Syst. Comput., vol. 1037, pp. 486–496, 2020, doi: 10.1007/978-3-030-29516-5_38.

S. Martha and E. Sulistianingsih INTISARI, “K Nearest Neighbor Dalam Imputasi Missing Data,” Bul. Ilm. Math. Stat. dan Ter., vol. 07, no. 1, pp. 9–14, 2018, [Online]. Available: http://archive.ics.uci.edu/ml/datas/Iris.

A. Supriyadi, A. Triayudi, and I. D. Sholihati, “Perbandingan Algoritma K-Means Dengan K-Medoids Pada Pengelompokan Armada Kendaraan Truk Berdasarkan Produktivitas,” JIPI (Jurnal Ilm. Penelit. dan Pembelajaran Inform., vol. 6, no. 2, pp. 229–240, 2021, doi: 10.29100/jipi.v6i2.2008.

D. A. Nasution, H. H. Khotimah, and N. Chamidah, “Perbandingan Normalisasi Data untuk Klasifikasi Wine Menggunakan Algoritma K-NN,” Comput. Eng. Sci. Syst. J., vol. 4, no. 1, p. 78, 2019, doi: 10.24114/cess.v4i1.11458.

L. Suriani, “Pengelompokan Data Kriminal Pada Poldasu Menentukan Pola Daerah Rawan Tindak Kriminal Menggunakan Data Mining Algoritma K-Means Clustering,” J. Sist. Komput. dan Inform., vol. 1, no. 2, p. 151, 2020, doi: 10.30865/json.v1i2.1955.

A. M. Sikana and A. W. Wijayanto, “Analisis Perbandingan Pengelompokan Indeks Pembangunan Manusia Indonesia Tahun 2019 dengan Metode Partitioning dan Hierarchical Clustering,” J. Ilmu Komput., vol. 14, no. 2, p. 66, 2021, doi: 10.24843/jik.2021.v14.i02.p01.

F. Yunita, “Penerapan Data Mining Menggunkan Algoritma K-Means Clustring Pada Penerimaan Mahasiswa Baru,” Sistemasi, vol. 7, no. 3, p. 238, 2018, doi: 10.32520/stmsi.v7i3.388.

S. Aulia, “Klasterisasi Pola Penjualan Pestisida Menggunakan Metode K-Means Clustering (Studi Kasus Di Toko Juanda Tani Kecamatan Hutabayu Raja),” Djtechno J. Teknol. Inf., vol. 1, no. 1, pp. 1–5, 2021, doi: 10.46576/djtechno.v1i1.964.

I. Wahyudi, M. B. Sulthan, and L. Suhartini, “Analisa Penentuan Cluster Terbaik Pada Metode K-Means Menggunakan Elbow Terhadap Sentra Industri Produksi Di Pamekasan,” J. Apl. Teknol. Inf. dan Manaj., vol. 2, no. 2, pp. 72–81, 2021, doi: 10.31102/jatim.v2i2.1274.

S. Asmiatun, “Penerapan Metode K-Medoids Untuk Pengelompokkan Kondisi Jalan Di Kota Semarang,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 6, no. 2, pp. 171–180, 2019, doi: 10.35957/jatisi.v6i2.193.

Published
2023-12-05
How to Cite
[1]
R. Syauqi, P. Sabrina, and I. Santikarama, “K-Means Clustering with KNN and Mean Imputation on CPU Benchmark Compilation Data”, JAIC, vol. 7, no. 2, pp. 231-239, Dec. 2023.
Section
Articles

Most read articles by the same author(s)