Improving House Price Clustering Results with K-means through the Implementation of One-hot Encoding Pre-processing Technique
DOI:
https://doi.org/10.30871/jaic.v9i3.9481Keywords:
Clustering, K-Means, One-hot Encoding, House PriceAbstract
Basic human needs include a house that serves as a place to live and a shelter from everything. In Indonesia, owning a house is still a challenging aspect due to its high price. Information on house prices is needed for prospective buyers or consumers, so that buyers can adjust their needs and finances, and for producers or sellers it is used as a way to determine the segmentation of targeted market groups. House prices are influenced by several factors including, building area, number of bedrooms, number of bathrooms, location, condition and the presence of a garage. This research aims to improve the quality of house price clustering with K-means and the application of one-hot encoding in the data pre-processing process in representing categorical data. The dataset used has two types of data, namely numeric and categorical. The cluster evaluation is based on the silhouette score matrix and the determination of k is based on the elbow graph. The results showed an increase in the silhouette score value after applying one-hot encoding 0.15 which was previously 0.09, with the number of k = 3. The 0.15 matrix result is relatively low, which is caused by the overlap of house price values in the dataset, but it has been shown that one-hot encoding can represent categorical data well in the data pre-processing process so that the data can be processed with the k-means algorithm.
Downloads
References
[1] B. G. Aji, D. C. A. Sondawa, M. R. Gifari, and S. Wijayanto, “Penerapan Algoritma K-Means Untuk Clustering Harga Rumah Di Bandung,” J. Ilm. Inform. Glob., vol. 14, no. 2, pp. 17–23, 2023, doi: 10.36982/jiig.v14i2.3189.
[2] dpu, “No Title,” 12 Maret 2019. [Online]. Available: https://dpu.kulonprogokab.go.id/detil/52/rumah-perumahan-dan-permukiman
[3] S. Y. Safitri Kiki, “No Title,” Kompas.com. Accessed: Feb. 18, 2025. [Online]. Available: https://money.kompas.com/read/2023/03/01/123000726/12-7-juta-rumah-tangga-belum-punya-rumah-jumlahnya-berpotensi-naik
[4] T. Lidia Putri and R. Danar Dana, “Penerapan Data Mining Pada Clustering Data Harga Rumah Dki Jakarta Menggunakan Algoritmak-Means,” JATI (Jurnal Mhs. Tek. Inform., vol. 8, no. 1, pp. 1174–1179, 2024, doi: 10.36040/jati.v8i1.8957.
[5] N. Wahidah, O. Juwita, and F. N. Arifin, “Pengelompokkan Daerah Rawan Bencana di Kabupaten Jember Menggunakan Metode K-Means Clustering,” INFORMAL Informatics J., vol. 8, no. 1, p. 22, 2023, doi: 10.19184/isj.v8i1.29542.
[6] P. M. Putri, L. Pujiastuti, I. Parlina, and Solikhun, “Pengelompokan Data Rasio Penggunaan Gas Rumah Tangga Berdasarkan Provinsi di Indonesia Menggunakan Metode K-Means Clustering,” Semin. Nas. Teknol. Komput. Sains, pp. 236–240, 2020.
[7] M. Barata, I. S. Ayuni, A. Y. Kartini, and Z. Alawi, “Algorima K-Means dalam Clustering Produk Skincare untuk Menentukan Strategi Pemasaran,” J. Inform. Polinema, vol. 10, no. 3, pp. 421–428, 2024, doi: 10.33795/jip.v10i3.5167.
[8] Z. R. Fadilah and A. W. Wijayanto, “Perbandingan Metode Klasterisasi Data Bertipe Campuran: One-Hot-Encoding, Gower Distance, dan K-Prototype Berdasarkan Akurasi (Studi Kasus: Chronic Kidney Disease Dataset),” J. Appl. Informatics Comput., vol. 7, no. 1, pp. 57–67, 2023, doi: 10.30871/jaic.v7i1.5857.
[9] Silviana et al., “STMIK Dian Cipta Cendikia Kotabumi,” no. 1, 2022.
[10] C. Herdian, A. Kamila, and I. G. Agung Musa Budidarma, “Studi Kasus Feature Engineering Untuk Data Teks: Perbandingan Label Encoding dan One-Hot Encoding Pada Metode Linear Regresi,” Technol. J. Ilm., vol. 15, no. 1, p. 93, 2024, doi: 10.31602/tji.v15i1.13457.
[11] H. Syahputra, “Clustering Tingkat Penjualan Menu (Food and Beverage) Menggunakan Algoritma K-Means,” J. KomtekInfo, vol. 9, pp. 29–33, 2022, doi: 10.35134/komtekinfo.v9i1.274.
[12] D. E. Kurniawan, and A. Fatulloh, ‘Clustering of Social Conditions in Batam, Indonesia Using K-Means Algorithm and Geographic Information System’, Int. J. Earth Sci. Eng., vol. 10, no. 05, pp. 1076–1080, 2017.
[13] N. Septiani and R. Herdiana, “Penerapan Algoritma K-Means Clustering Untuk Harga Rumah di Jakarta Selatan Nuraeni Septiani Sekolah Tinggi Manajemen Informatika dan Komputer (STMIK) IKMI Cirebon Saeful Anwar Sekolah Tinggi Manajemen Informatika dan Komputer (STMIK) IKMI Cirebon,” Trending J. Ekon. Akunt. dan Manaj., vol. 1, no. 2, 2023.
[14] Z. F. Daldiri, M. Rafly, and I. Veritawati, “Clustering Daftar Harga Rumah di Jakarta Dengan Algoritma K-Means,” J. Informatics Adv. Comput., vol. 3, no. 2, pp. 155–160, 2022, [Online]. Available: https://www.kaggle.com/datasets/wisnuanggara/daf
[15] B. Bin Jia and M. L. Zhang, “Multi-Dimensional Classification via Sparse Label Encoding,” Proc. Mach. Learn. Res., vol. 139, no. Mdc, pp. 4917–4926, 2021.
[16] M. Islam and M. Nasser, “PCA versus ICA in Visualization of Clusters,” Statru.Org, no. October, pp. 978–984, 2012, [Online]. Available: http://www.statru.org/conference/wp-content/uploads/2012/01/000_Contrbuted_Part-2.pdf
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Vicka Rizqi Maulani, Mula Agung Barata, Pelangi Eka Yuwita

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).