Analysis of Elbow, Silhouette, Davies-Bouldin, Calinski-Harabasz, and Rand-Index Evaluation on K-Means Algorithm for Classifying Flood-Affected Areas in Jakarta

  • Ilham Firman Ashari Institut Teknologi Sumatera
  • Eko Dwi Nugroho Institut Teknologi Sumatera
  • Randi Baraku Institut Teknologi Sumatera
  • Ilham Novri Yanda Institut Teknologi Sumatera
  • Ridho Liwardana Institut Teknologi Sumatera
Keywords: K-Means, Flood, Jakarta, Classification, Rand-Index

Abstract

Jakarta is the capital city of Indonesia, which has a high population density, and is an area that is frequently hit by floods. This study aims to determine the classification of flood-affected areas in Jakarta between severe, moderate, and low. Design/method/approach: The study was conducted using the elbow, Silhouette, Davidson-Bouldin, and Calinski-Harabasz methods on the K-means algorithm, as well as the Rand method. index for evaluation. Grouping with 3 and 6 groups is the best grouping value based on Calinski-Harabasz. By using the davies bouldin index from the observations, the K value with a value of 6 has the smallest Davies-Bouldin value with a value of 0.2737. By using sillhoute, the experimental results obtained the best values sequentially, namely K=2, K=3, and K=6 with silhouette values of 0.866, 0.854, and 0.803. In this experiment, based on the elbow method, it was found that the best K value was K=3. This was obtained because it was based on observations on the appearance of the SSE data compared to the value of K. In the graph above, it can be seen that the largest decrease in data occurred at K=3 and after this decrease, the decline began to slope. The rand index is a method used to compare several cluster methods. If the value is >= 90 it is a very good result, if the value is in the range 80 to 90 it identifies a good index, whereas if it is below 80 it indicates a bad index. The results show that cluster three is verified as the best cluster with a value of 1, followed by a second alternative with cluster 2 of 0.9182. From several validation and evaluation methods it can be concluded that the best grouping can be done using 3 clusters. The results of the study yielded a value of 75.4% in low areas, 21.1% in moderate areas, and 3.5% in severe areas.

Downloads

Download data is not yet available.

References

Rahmatulloh, “DINAMIKA KEPENDUDUKAN DI IBUKOTA JAKARTA (Deskripsi Perkembangan Kuantitas, Kualitas dan Kesejahteraan Penduduk di DKI Jakarta),” Genta Mulia, vol. VIII, no. 2, pp. 54–67, 2017.

Eldi, “Analisis Penyebab Banjir di DKI Jakarta,” J. Inov. Penelit., vol. 1, no. 6, pp. 1057–1065, 2020.

M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means algorithm: A comprehensive survey and performance evaluation,” Electron., vol. 9, no. 8, pp. 1–12, 2020, doi: 10.3390/electronics9081295.

H. Firdaus and A. Sofro, “Analisa Cluster Menggunakan K-Means Dan Fuzzy C-Means Dalam Pengelompokan Provinsi Menurut Data Intesitas Bencana Alam Di Indonesia Tahun 2017-2021,” MATHunesa J. Ilm. Mat., vol. 10, no. 1, pp. 50–60, 2022, doi: 10.26740/mathunesa.v10n1.p50-60.

M. Nishom, “Perbandingan Akurasi Euclidean Distance, Minkowski Distance, dan Manhattan Distance pada Algoritma K-Means Clustering berbasis Chi-Square,” J. Inform. J. Pengemb. IT, vol. 4, no. 1, pp. 20–24, 2019, doi: 10.30591/jpit.v4i1.1253.

K. P. Sinaga and M. S. Yang, “Unsupervised K-means clustering algorithm,” IEEE Access, vol. 8, pp. 80716–80727, 2020, doi: 10.1109/ACCESS.2020.2988796.

M. Hoffmann and F. Noé, “Generating valid Euclidean distance matrices,” 2019, [Online]. Available: http://arxiv.org/abs/1910.03131.

C. Yuan and H. Yang, “Research on K-Value Selection Method of K-Means Clustering Algorithm,” J, vol. 2, no. 2, pp. 226–235, 2019, doi: 10.3390/j2020016.

A. Winarta and W. J. Kurniawan, “Optimasi cluster k-means menggunakan metode elbow pada data pengguna narkoba dengan pemrograman python,” J. Tek. Inform. Kaputama, vol. 5, no. 1, pp. 113–119, 2021.

X. Wang and Y. Xu, “An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index,” IOP Conf. Ser. Mater. Sci. Eng., vol. 569, no. 5, 2019, doi: 10.1088/1757-899X/569/5/052024.

Y. Yu, Y. Wang, G. Zhang, and J. Wang, “Research of Fault Feature Extraction and Analysis Method Based on Aeroengine Fault Data,” Proc. - 2020 Chinese Autom. Congr. CAC 2020, pp. 2960–2965, 2020, doi: 10.1109/CAC51589.2020.9327519.

I. F. Ashari, R. Banjarnahor, and D. R. Farida, “Application of Data Mining with the K-Means Clustering Method and Davies Bouldin Index for Grouping IMDB Movies,” vol. 6, no. 1, pp. 7–15, 2022.

M. A. Syakur, B. K. Khotimah, E. M. S. Rochman, and B. D. Satoto, “Integration K-Means Clustering Method and Elbow Method for Identification of the Best Customer Profile Cluster,” IOP Conf. Ser. Mater. Sci. Eng., vol. 336, no. 1, 2018, doi: 10.1088/1757-899X/336/1/012017.

A. R. Mamat, F. S. Mohamed, M. A. Mohamed, N. M. Rawi, and M. I. Awang, “Silhouette index for determining optimal k-means clustering on images in different color models,” Int. J. Eng. Technol., vol. 7, pp. 105–109, 2018, doi: 10.14419/ijet.v7i2.14.11464.

S. P. Lima and M. D. Cruz, “A genetic algorithm using Calinski-Harabasz index for automatic clustering problem,” Rev. Bras. Comput. Apl., vol. 12, no. 3, pp. 97–106, 2020, doi: 10.5335/rbca.v12i3.11117.

S. I. Murpratiwi, I. G. Agung Indrawan, and A. Aranta, “Analisis Pemilihan Cluster Optimal Dalam Segmentasi Pelanggan Toko Retail,” J. Pendidik. Teknol. dan Kejuru., vol. 18, no. 2, p. 152, 2021, doi: 10.23887/jptk-undiksha.v18i2.37426.

J. E. Chacón and A. I. Rastrojo, “Minimum adjusted Rand index for two clusterings of a given size,” Adv. Data Anal. Classif., 2022, doi: 10.1007/s11634-022-00491-w.

Published
2023-07-31
How to Cite
[1]
I. Ashari, E. Dwi Nugroho, R. Baraku, I. Novri Yanda, and R. Liwardana, “Analysis of Elbow, Silhouette, Davies-Bouldin, Calinski-Harabasz, and Rand-Index Evaluation on K-Means Algorithm for Classifying Flood-Affected Areas in Jakarta”, JAIC, vol. 7, no. 1, pp. 95-103, Jul. 2023.
Section
Articles

Most read articles by the same author(s)