Implementation of HDBSCAN and Bayesian Optimization for Clustering Flood-Affected Regions in Indonesia

Nasywa Azzah Nabila; Aviolla Terza Damaliana; Shindi Shella May Wara

doi:10.30871/jaic.v10i3.12734

Authors

Nasywa Azzah Nabila Universitas Pembangunan Nasional "Veteran" Jawa Timur
Aviolla Terza Damaliana Universitas Pembangunan Nasional "Veteran" Jawa Timur
Shindi Shella May Wara Universitas Pembangunan Nasional "Veteran" Jawa Timur

DOI:

https://doi.org/10.30871/jaic.v10i3.12734

Keywords:

Bayesian Optimization, Clustering, DBCV, Floods, HDBSCAN

Abstract

Floods are among the most frequent natural disasters in Indonesia, with thousands of events causing significant impacts on infrastructure damage and human lives. The substantial increase in the number of victims and flood-related damages in 2024 indicates that flood disaster mitigation efforts in Indonesia remain suboptimal. Consequently, a clustering-based analytical approach is required to understand patterns of flood impact across provinces. This study aims to cluster provinces in Indonesia based on flood-affected indicators using the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) method with Bayesian Optimization to obtain optimal hyperparameters. This study comprises several stages, including data collection, data standardization, statistical test, data reduction, hyperparameter optimization, HDBSCAN algorithm, model evaluation, and analysis of clustering results. The results show that HDBSCAN with Bayesian Optimization yields a well-separated cluster structure with a DBCV value of 0.515. The clustering results consist of three primary clusters and one noise cluster. Cluster 0 (High Displacement & Inundation) consisting of 5 provinces, cluster 1 (High Fatality & Structural Damage) consisting of 4 provinces, cluster 2 (Low Impact) consisting of 21 provinces, and the noise cluster consisting of 8 provinces. These findings are intended to provide a foundation for the government to formulate targeted flood mitigation strategies tailored to the flood impact characteristics of each province.

Downloads

Download data is not yet available.

References

[1] Data Bencana Indonesia 2023. Pusat Data Informasi dan Komunikasi Kebencanaan Badan Nasional Penanggulangan Bencana, 2023.

[2] Data Bencana Indonesia 2024. Pusat Data Informasi dan Komunikasi Kebencanaan Badan Nasional Penanggulangan Bencana, 2025.

[3] W. T. Oktaviany, F. Insani, and A. Nazir, “Pengelompokan Wilayah Bencana Banjir di Indonesia Menggunakan Algoritma K-Means,” BULLETIN OF COMPUTER SCIENCE RESEARCH, vol. 5, no. 4, pp. 542–552, Jun. 2025, doi: 10.47065/bulletincsr.v5i4.608.

[4] U. Islamy et al., “Pengelompokkan Provinsi Di Indonesia Berdasarkan Indikator Dampak Bencana Banjir Tahun 2017-2020 Menggunakan K-Medoids,” Bimaster: Buletin Ilmiah Matematika, Statistika dan Terapannya, vol. 11, no. 2, pp. 381–388, 2022.

[5] M. N. Hayati et al., “Pengelompokan Provinsi Di Indonesia Berdasarkan Data Jumlah Kejadian Dan Dampak Bencana Banjir Menggunakan Metode Fuzzy C-Means,” Variansi: Journal of Statistics and Its Application on Teaching and Research, vol. 6, no. 01, pp. 21–34, 2024, doi: 10.35580/variansiunm167.

[6] A. A. Wani, “Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions,” PeerJ Comput Sci, vol. 10, Aug. 2024, doi: 10.7717/peerj-cs.2286.

[7] D. N. Amalina and A. Fauzan, “A Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) Approach for Identifying Potential Villages in Buleleng Regency,” Knowledge Engineering and Data Science, vol. 7, no. 2, Dec. 2024, doi: 10.17977/um018v7i22024p187-199.

[8] M. Mouhiha and A. Mabrouk, “An empirically-driven clustering framework for NoSQL data warehouse conversion: Optimizing column family design from relational big data using HDBSCAN,” Inf Softw Technol, vol. 195, p. 108117, Jul. 2026, doi: 10.1016/j.infsof.2026.108117.

[9] M. Aljibawi, H. K. Algabri, and Z. I. Rasool, “Adaptive Clustering Using Enhanced DBSCAN: a Dynamic Approach to Optimizing Density-based Clustering,” Statistics, Optimization and Information Computing, vol. 14, no. 4, pp. 1980–1991, Sep. 2025, doi: 10.19139/soic-2310-5070-2484.

[10] D. P. Uddandarao, M. R. Konatham, and R. K. Vadlamani, “Robust and Scalable Statistical Models for High-Dimensional Marketing Data Applications: A Comprehensive Review,” in 2025 5th International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT), IEEE, Sep. 2025, pp. 1–8. doi: 10.1109/ICERECT65215.2025.11377070.

[11] M. Binois and N. Wycoff, “A Survey on High-dimensional Gaussian Process Modeling with Application to Bayesian Optimization,” ACM Transactions on Evolutionary Learning and Optimization, vol. 2, no. 2, pp. 1–26, Jun. 2022, doi: 10.1145/3545611.

[12] Md. S. S. Islam et al., “Optimizing Short-Term Photovoltaic Power Forecasting: A Novel Approach with Gaussian Process Regression and Bayesian Hyperparameter Tuning,” Processes, vol. 12, no. 3, p. 546, Mar. 2024, doi: 10.3390/pr12030546.

[13] C. Thanos, C. Meghini, V. Bartalesi, and G. Coro, “An exploratory approach to data driven knowledge creation,” J Big Data, vol. 10, no. 1, p. 29, Mar. 2023, doi: 10.1186/s40537-023-00702-x.

[14] M. A. Mohammed, “Effect of Using Numerical Data Scaling on Supervised Machine Learning Performance,” Global Libyan Journal, vol. 67, pp. 1–21, 2024.

[15] F. Aldi, F. Hadi, N. A. Rahmi, and S. Defit, “Standardscaler’s Potential In Enhancing Breast Cancer Accuracy Using Machine Learning,” Journal of Applied Engineering and Technological Science, vol. 5, no. 1, pp. 401–413, 2023.

[16] Zhang, T. Sangsawang, K. Vipahasna, and M. Pigultong, “A Mixed-Methods Data Approach Integrating Importance-Performance Analysis (IPA) and Kaiser-Meyer-Olkin (KMO) in Applied Talent Cultivation,” Journal of Applied Data Sciences, vol. 5, no. 1, pp. 256–267, Jan. 2024, doi: 10.47738/jads.v5i1.170.

[17] S. Nabhan and A. Habók, “The Digital Literacy Academic Writing Scale: Exploratory Factor Analysis,” Sage Open, vol. 15, no. 1, Jan. 2025, doi: 10.1177/21582440241311709.

[18] S. H. A. Latif, A. S. Alwan, and A. M. Mohamed, “Principal component analysis as tool for data reduction with an application,” EUREKA: Physics and Engineering, no. 5, pp. 184–198, Sep. 2022, doi: 10.21303/2461-4262.2022.002577.

[19] S. Ramasubramanian, S. C.N, A. J. Athreya, A. Devarajan, A. U. Shankar, and R. Kumar P, “Data Dimensionality Reduction Using Principal Component Analysis: A Case Study,” in 2024 1st International Conference on Communications and Computer Science (InCCCS), IEEE, May 2024, pp. 1–6. doi: 10.1109/InCCCS60947.2024.10593421.

[20] A. Hebbal, M. Balesdent, L. Brevault, N. Melab, and E.-G. Talbi, “Deep Gaussian process for multi-objective Bayesian optimization,” Optimization and Engineering, vol. 24, no. 3, pp. 1809–1848, Sep. 2023, doi: 10.1007/s11081-022-09753-0.

[21] X. Wang, Y. Jin, S. Schmitt, and M. Olhofer, “Recent Advances in Bayesian Optimization,” ACM Comput Surv, vol. 55, no. 13s, pp. 1–36, Dec. 2023, doi: 10.1145/3582078.

[22] D. Vijayan and I. Aziz, “Adaptive Hierarchical Density-Based Spatial Clustering Algorithm for Streaming Applications,” Telecom, vol. 4, no. 1, pp. 1–14, Dec. 2022, doi: 10.3390/telecom4010001.

[23] A. Sante, A. S. Font, D. Mistry, S. Ortega-Martorell, and I. Olier, “Optimized HDBSCAN clustering for reconstructing the merger history of the Milky Way: applications and limitations,” Mon Not R Astron Soc, Mar. 2026, doi: 10.1093/mnras/stag503.

[24] L. Wang, P. Chen, L. Chen, and J. Mou, “Ship AIS Trajectory Clustering: An HDBSCAN-Based Approach,” J Mar Sci Eng, vol. 9, no. 6, p. 566, May 2021, doi: 10.3390/jmse9060566.

[25] A. Mashreghi and V. King, “Broadcast and minimum spanning tree with o(m) messages in the asynchronous CONGEST model,” Distrib Comput, vol. 34, no. 4, pp. 283–299, 2021.

[26] D. Chicco, G. Sabino, L. Oneto, and G. Jurman, “The DBCV index is more informative than DCSI, CDbw, and VIASCKDE indices for unsupervised clustering internal assessment of concave-shaped and density-based clusters,” PeerJ Comput Sci, vol. 11, p. e3095, Aug. 2025, doi: 10.7717/peerj-cs.3095.

[27] Z. Teng, J. Yan, D. Liu, and P. Zhang, “When Does the Silhouette Score Work? A Comprehensive Study in Network Clustering,” Dec. 2025, [Online]. Available: http://arxiv.org/abs/2512.24841

Implementation of HDBSCAN and Bayesian Optimization for Clustering Flood-Affected Regions in Indonesia

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

submit

tools

issn