Analysis of Elbow, Silhouette, Davies-Bouldin, Calinski-Harabasz, and Rand-Index Evaluation on K-Means Algorithm for Classifying Flood-Affected Areas in Jakarta
Abstract
Jakarta is the capital city of Indonesia, which has a high population density, and is an area that is frequently hit by floods. This study aims to determine the classification of flood-affected areas in Jakarta between severe, moderate, and low. Design/method/approach: The study was conducted using the elbow, Silhouette, Davidson-Bouldin, and Calinski-Harabasz methods on the K-means algorithm, as well as the Rand method. index for evaluation. Grouping with 3 and 6 groups is the best grouping value based on Calinski-Harabasz. By using the davies bouldin index from the observations, the K value with a value of 6 has the smallest Davies-Bouldin value with a value of 0.2737. By using sillhoute, the experimental results obtained the best values sequentially, namely K=2, K=3, and K=6 with silhouette values of 0.866, 0.854, and 0.803. In this experiment, based on the elbow method, it was found that the best K value was K=3. This was obtained because it was based on observations on the appearance of the SSE data compared to the value of K. In the graph above, it can be seen that the largest decrease in data occurred at K=3 and after this decrease, the decline began to slope. The rand index is a method used to compare several cluster methods. If the value is >= 90 it is a very good result, if the value is in the range 80 to 90 it identifies a good index, whereas if it is below 80 it indicates a bad index. The results show that cluster three is verified as the best cluster with a value of 1, followed by a second alternative with cluster 2 of 0.9182. From several validation and evaluation methods it can be concluded that the best grouping can be done using 3 clusters. The results of the study yielded a value of 75.4% in low areas, 21.1% in moderate areas, and 3.5% in severe areas.
Downloads
References
Rahmatulloh, “DINAMIKA KEPENDUDUKAN DI IBUKOTA JAKARTA (Deskripsi Perkembangan Kuantitas, Kualitas dan Kesejahteraan Penduduk di DKI Jakarta),” Genta Mulia, vol. VIII, no. 2, pp. 54–67, 2017.
Eldi, “Analisis Penyebab Banjir di DKI Jakarta,” J. Inov. Penelit., vol. 1, no. 6, pp. 1057–1065, 2020.
M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means algorithm: A comprehensive survey and performance evaluation,” Electron., vol. 9, no. 8, pp. 1–12, 2020, doi: 10.3390/electronics9081295.
H. Firdaus and A. Sofro, “Analisa Cluster Menggunakan K-Means Dan Fuzzy C-Means Dalam Pengelompokan Provinsi Menurut Data Intesitas Bencana Alam Di Indonesia Tahun 2017-2021,” MATHunesa J. Ilm. Mat., vol. 10, no. 1, pp. 50–60, 2022, doi: 10.26740/mathunesa.v10n1.p50-60.
M. Nishom, “Perbandingan Akurasi Euclidean Distance, Minkowski Distance, dan Manhattan Distance pada Algoritma K-Means Clustering berbasis Chi-Square,” J. Inform. J. Pengemb. IT, vol. 4, no. 1, pp. 20–24, 2019, doi: 10.30591/jpit.v4i1.1253.
K. P. Sinaga and M. S. Yang, “Unsupervised K-means clustering algorithm,” IEEE Access, vol. 8, pp. 80716–80727, 2020, doi: 10.1109/ACCESS.2020.2988796.
M. Hoffmann and F. Noé, “Generating valid Euclidean distance matrices,” 2019, [Online]. Available: http://arxiv.org/abs/1910.03131.
C. Yuan and H. Yang, “Research on K-Value Selection Method of K-Means Clustering Algorithm,” J, vol. 2, no. 2, pp. 226–235, 2019, doi: 10.3390/j2020016.
A. Winarta and W. J. Kurniawan, “Optimasi cluster k-means menggunakan metode elbow pada data pengguna narkoba dengan pemrograman python,” J. Tek. Inform. Kaputama, vol. 5, no. 1, pp. 113–119, 2021.
X. Wang and Y. Xu, “An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index,” IOP Conf. Ser. Mater. Sci. Eng., vol. 569, no. 5, 2019, doi: 10.1088/1757-899X/569/5/052024.
Y. Yu, Y. Wang, G. Zhang, and J. Wang, “Research of Fault Feature Extraction and Analysis Method Based on Aeroengine Fault Data,” Proc. - 2020 Chinese Autom. Congr. CAC 2020, pp. 2960–2965, 2020, doi: 10.1109/CAC51589.2020.9327519.
I. F. Ashari, R. Banjarnahor, and D. R. Farida, “Application of Data Mining with the K-Means Clustering Method and Davies Bouldin Index for Grouping IMDB Movies,” vol. 6, no. 1, pp. 7–15, 2022.
M. A. Syakur, B. K. Khotimah, E. M. S. Rochman, and B. D. Satoto, “Integration K-Means Clustering Method and Elbow Method for Identification of the Best Customer Profile Cluster,” IOP Conf. Ser. Mater. Sci. Eng., vol. 336, no. 1, 2018, doi: 10.1088/1757-899X/336/1/012017.
A. R. Mamat, F. S. Mohamed, M. A. Mohamed, N. M. Rawi, and M. I. Awang, “Silhouette index for determining optimal k-means clustering on images in different color models,” Int. J. Eng. Technol., vol. 7, pp. 105–109, 2018, doi: 10.14419/ijet.v7i2.14.11464.
S. P. Lima and M. D. Cruz, “A genetic algorithm using Calinski-Harabasz index for automatic clustering problem,” Rev. Bras. Comput. Apl., vol. 12, no. 3, pp. 97–106, 2020, doi: 10.5335/rbca.v12i3.11117.
S. I. Murpratiwi, I. G. Agung Indrawan, and A. Aranta, “Analisis Pemilihan Cluster Optimal Dalam Segmentasi Pelanggan Toko Retail,” J. Pendidik. Teknol. dan Kejuru., vol. 18, no. 2, p. 152, 2021, doi: 10.23887/jptk-undiksha.v18i2.37426.
J. E. Chacón and A. I. Rastrojo, “Minimum adjusted Rand index for two clusterings of a given size,” Adv. Data Anal. Classif., 2022, doi: 10.1007/s11634-022-00491-w.
Copyright (c) 2023 Ilham Firman Ashari, Eko Dwi Nugroho, Randi Baraku, Ilham Novri Yanda, Ridho Liwardana
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).