A Improving House Price Clustering Results with K-means through the Implementation of One-hot Encoding Pre-processing Technique

Vicka Rizqi Maulani; Mula Agung Barata; Pelangi Eka Yuwita

doi:10.30871/jaic.v9i3.9481

Authors

Vicka Rizqi Maulani Teknik Informatika, Universitas Nahdlatul Ulama Sunan Giri
Mula Agung Barata Teknik Informatika, Universitas Nahdlatul Ulama Sunan Giri
Pelangi Eka Yuwita Teknik Mesin, Universitas Nahdlatul Ulama Sunan Giri

DOI:

https://doi.org/10.30871/jaic.v9i3.9481

Keywords:

Clustering, K-Means, One-hot Encoding, House Price

Abstract

Basic human needs include a house that serves as a place to live and a shelter from everything. In Indonesia, owning a house is still a challenging aspect due to its high price. Information on house prices is needed for prospective buyers or consumers, so that buyers can adjust their needs and finances, and for producers or sellers it is used as a way to determine the segmentation of targeted market groups. House prices are influenced by several factors including, building area, number of bedrooms, number of bathrooms, location, condition and the presence of a garage. This research aims to improve the quality of house price clustering with K-means and the application of one-hot encoding in the data pre-processing process in representing categorical data. The dataset used has two types of data, namely numeric and categorical. The cluster evaluation is based on the silhouette score matrix and the determination of k is based on the elbow graph. The results showed an increase in the silhouette score value after applying one-hot encoding 0.15 which was previously 0.09, with the number of k = 3. The 0.15 matrix result is relatively low, which is caused by the overlap of house price values in the dataset, but it has been shown that one-hot encoding can represent categorical data well in the data pre-processing process so that the data can be processed with the k-means algorithm.

Downloads

Download data is not yet available.

References

[1] B. G. Aji, D. C. A. Sondawa, M. R. Gifari, and S. Wijayanto, “Penerapan Algoritma K-Means Untuk Clustering Harga Rumah Di Bandung,” J. Ilm. Inform. Glob., vol. 14, no. 2, pp. 17–23, 2023, doi: 10.36982/jiig.v14i2.3189.

[2] dpu, “No Title,” 12 Maret 2019. [Online]. Available: https://dpu.kulonprogokab.go.id/detil/52/rumah-perumahan-dan-permukiman

[3] S. Y. Safitri Kiki, “No Title,” Kompas.com. Accessed: Feb. 18, 2025. [Online]. Available: https://money.kompas.com/read/2023/03/01/123000726/12-7-juta-rumah-tangga-belum-punya-rumah-jumlahnya-berpotensi-naik

[4] T. Lidia Putri and R. Danar Dana, “Penerapan Data Mining Pada Clustering Data Harga Rumah Dki Jakarta Menggunakan Algoritmak-Means,” JATI (Jurnal Mhs. Tek. Inform., vol. 8, no. 1, pp. 1174–1179, 2024, doi: 10.36040/jati.v8i1.8957.

[5] N. Wahidah, O. Juwita, and F. N. Arifin, “Pengelompokkan Daerah Rawan Bencana di Kabupaten Jember Menggunakan Metode K-Means Clustering,” INFORMAL Informatics J., vol. 8, no. 1, p. 22, 2023, doi: 10.19184/isj.v8i1.29542.

[6] P. M. Putri, L. Pujiastuti, I. Parlina, and Solikhun, “Pengelompokan Data Rasio Penggunaan Gas Rumah Tangga Berdasarkan Provinsi di Indonesia Menggunakan Metode K-Means Clustering,” Semin. Nas. Teknol. Komput. Sains, pp. 236–240, 2020.

[7] M. Barata, I. S. Ayuni, A. Y. Kartini, and Z. Alawi, “Algorima K-Means dalam Clustering Produk Skincare untuk Menentukan Strategi Pemasaran,” J. Inform. Polinema, vol. 10, no. 3, pp. 421–428, 2024, doi: 10.33795/jip.v10i3.5167.

[8] Z. R. Fadilah and A. W. Wijayanto, “Perbandingan Metode Klasterisasi Data Bertipe Campuran: One-Hot-Encoding, Gower Distance, dan K-Prototype Berdasarkan Akurasi (Studi Kasus: Chronic Kidney Disease Dataset),” J. Appl. Informatics Comput., vol. 7, no. 1, pp. 57–67, 2023, doi: 10.30871/jaic.v7i1.5857.

[9] Silviana et al., “STMIK Dian Cipta Cendikia Kotabumi,” no. 1, 2022.

[10] C. Herdian, A. Kamila, and I. G. Agung Musa Budidarma, “Studi Kasus Feature Engineering Untuk Data Teks: Perbandingan Label Encoding dan One-Hot Encoding Pada Metode Linear Regresi,” Technol. J. Ilm., vol. 15, no. 1, p. 93, 2024, doi: 10.31602/tji.v15i1.13457.

[11] H. Syahputra, “Clustering Tingkat Penjualan Menu (Food and Beverage) Menggunakan Algoritma K-Means,” J. KomtekInfo, vol. 9, pp. 29–33, 2022, doi: 10.35134/komtekinfo.v9i1.274.

[12] D. E. Kurniawan, and A. Fatulloh, ‘Clustering of Social Conditions in Batam, Indonesia Using K-Means Algorithm and Geographic Information System’, Int. J. Earth Sci. Eng., vol. 10, no. 05, pp. 1076–1080, 2017.

[13] N. Septiani and R. Herdiana, “Penerapan Algoritma K-Means Clustering Untuk Harga Rumah di Jakarta Selatan Nuraeni Septiani Sekolah Tinggi Manajemen Informatika dan Komputer (STMIK) IKMI Cirebon Saeful Anwar Sekolah Tinggi Manajemen Informatika dan Komputer (STMIK) IKMI Cirebon,” Trending J. Ekon. Akunt. dan Manaj., vol. 1, no. 2, 2023.

[14] Z. F. Daldiri, M. Rafly, and I. Veritawati, “Clustering Daftar Harga Rumah di Jakarta Dengan Algoritma K-Means,” J. Informatics Adv. Comput., vol. 3, no. 2, pp. 155–160, 2022, [Online]. Available: https://www.kaggle.com/datasets/wisnuanggara/daf

[15] B. Bin Jia and M. L. Zhang, “Multi-Dimensional Classification via Sparse Label Encoding,” Proc. Mach. Learn. Res., vol. 139, no. Mdc, pp. 4917–4926, 2021.

[16] M. Islam and M. Nasser, “PCA versus ICA in Visualization of Clusters,” Statru.Org, no. October, pp. 978–984, 2012, [Online]. Available: http://www.statru.org/conference/wp-content/uploads/2012/01/000_Contrbuted_Part-2.pdf

Improving House Price Clustering Results with K-means through the Implementation of One-hot Encoding Pre-processing Technique

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

submit

tools

issn