Stacking of DT, RF, and Gradient Boosting Algorithms for Classification of Building Damage Due to Earthquakes

Authors

  • Nur Aqliah Ilmi Universitas Dian Nuswantoro
  • Nurul Anisa Sri Winarsih Universitas Dian Nuswantoro

DOI:

https://doi.org/10.30871/jaic.v9i6.11272

Keywords:

Building Damage, Earthquake Clasification, Ensemble Stacking, ADASYN

Abstract

Classification of building damage levels due to earthquakes is an important aspect in disaster mitigation and post-disaster risk assessment. This study aims to improve classification accuracy on imbalanced data using an ensemble stacking method. It combines Decision Tree, Random Forest, and Gradient Boosting algorithms, with Logistic Regression as a meta-learner. The building damage dataset from the 2015 Gorkha Nepal earthquake underwent data cleaning, categorical transformation, normalization, and balancing using ADASYN. Evaluation showed that Random Forest was the best single model. The stacking model achieved the highest accuracy of 91.77% after balancing. These results show that stacking improves generalization and classification accuracy on imbalanced data. This suggests significant potential for integration into disaster decision-support systems that require fast, accurate building-damage assessment.

Downloads

Download data is not yet available.

References

[1] Badan Geologi. (2021). Peta sumber dan bahaya gempa Indonesia 2021. Pusat Vulkanologi dan Mitigasi Bencana Geologi, Kementerian Energi dan Sumber Daya Mineral. https://www.esdm.go.id

[2] Badan Meteorologi, Klimatologi, dan Geofisika. (2022). Informasi gempa terkini dan sesar aktif di Indonesia. https://www.bmkg.go.id.

[3] Winarsih, S., et al. (2025). Optimizing earthquake damage prediction using particle swarm optimization-based feature selection. Jurnal Informatika dan Komputer, 11(1), 77–86.

[4] Dachi, M. A., & Sitompul, O. S. (2023). Penerapan metode ensemble learning untuk klasifikasi data menggunakan stacking, bagging, dan boosting. Jurnal Teknologi Informasi dan Ilmu Komputer, 10(2), 121–130.

[5] Joses, Y. S., Yulvida, E., & Rochimah, S. (2024). Ensemble learning menggunakan stacking untuk meningkatkan kinerja prediksi pada data tidak seimbang. Jurnal Teknologi dan Sistem Komputer, 12(1), 89–97.

[6] DrivenData. (2020). Richter’s predictor: Modelling earthquake damage. https://www.drivendata.org/competitions/57/nepal-earthquake/

[7] Buhl, N. (2023). Mastering data cleaning & data preprocessing. Encord. Retrieved May 26, 2025, from https://encord.com/blog/data-cleaning-data-preprocessing/

[8] Dibimbing.id. (2025, Maret 24). One Hot Encoding adalah: Arti, Manfaat, dan Penerapannya. Retrieved May 26, 2025, dari https://dibimbing.id/blog/detail/one-hot-encoding-adalah

[9] Monika, A. P., Risti, F. E. P., Binanto, I., & Sianipar, N. F. (2023). Perbandingan algoritma klasifikasi Random Forest, Gaussian Naive Bayes, dan K-Nearest Neighbor untuk data tidak seimbang dan data yang diseimbangkan dengan metode Adaptive Synthetic pada dataset LCMS tanaman keladi tikus. Jurnal Seminar Nasional Teknik Elektro, Informatika & Sistem Informasi (SINTaKS), 3–7.

[10] M. Ibrahim, “Evolution of Random Forest from Decision Tree and Bagging: A Bias–Variance Perspective,” Dhaka University Journal of Applied Science and Engineering, vol. 7, no. 1, pp. 66–71, 2022. doi: 10.3329/dujase.v7i1.62888

[11] R. Zuhri, Kusrini, and D. Ariatmanto, “Analisis perbandingan algoritma klasifikasi untuk identifikasi diabetes dengan menggunakan metode Random Forest dan Naive Bayes,“ Jurnal Inovasi Teknologi dan Sains (JINTEKS), vol. 4, no. 2, pp. 222-230,2022. [Online]. Available: https://www.jurnal.uts.ac.id/index.php/JINTEKS/article/view/5146

[12] L. W. Rizkallah, “Enhancing the performance of gradient boosting trees on regression problems,” Journal of Big Data, vol. 12, art. no. 35, pp. 1–14, 2025. doi: 10.1186/s40537-025-01071-3

[13] W. N. Ismail and H. A. Alsalamah, “GA-Stacking: A New Stacking-Based Ensemble Learning Method to Forecast the COVID-19 Outbreak,” International Journal of Advanced Computer Science and Applications, vol. 14, no. 2, pp. 202–210, 2023.

[14] Swaminathan, S., & Tantri, B. R. (2024). Confusion matrix-based performance evaluation metrics. African Journal of Biomedical Research, 27(4S), 4023–4031.

[15] Abubakar, P. (2025, May 8). Evaluation metrics in machine learning: Accuracy, precision, recall & f1-score. Medium. https://medium.com/@abubakarp789/evaluatiom-metrics-in-machine-learning-accuracy-precision-recall-f1-score-c4c4e553677a

[16] Haya, A., & Ramme, M. Y. (2024). Penerapan algoritma stacking ensemble machine learning berbasis pohon untuk prediksi penyakit diabetes. Prosiding Seminar Nasional Sains Data, 4(1), 954–961.

[17] Ghimire, S., Gueguen, P., Giffard-Roisin, S., & Schorlemmer, D. (2022). Testing machine learning models for seismic damage prediction at a regional scale using a building damage dataset collected after the 2015 Gorkha, Nepal earthquake. Earthquake Spectra, 38(4), 2970–2993.

[18] M. Ahmed, A. Khan, and S. Hussain, “An improved adaptive synthetic sampling approach for imbalanced data classification,” Expert Systems with Applications, vol. 206, p. 117816, 2022, doi: 10.1016/j.eswa.2022.117816

[19] E. Elgeldawi and A. M. Zaki, “Hyperparameter Tuning for Machine Learning Algorithms: A Comprehensive Comparative Analysis,” Informatics, vol. 8, no. 4, p. 79, 2021. [Online]. Available: https://doi.org/10.3390/informatics8040079

[20] A. Ben-David, D. Lustgarten, and Y. Koren, “High Per Parameter: A Large-Scale Study of Hyperparameter Tuning for Machine Learning Algorithms,” Algorithms, vol. 15, no. 9, p. 315, 2022. [Online]. Available: https://www.mdpi.com/1999-4893/15/9/315

[21] M. Saarela and S. Jauhiainen, “Comparison of feature importance measures as explanations for classification models,” SN Applied Sciences, vol. 3, no. 2, pp. 41–48, 2021, https://doi.org/10.1007/s42452-021-04148-9.

[22] J. Zhou, A. H. Gandomi, F. Chen, and A. Holzinger, “Evaluating the quality of machine learning explanations: A survey on methods and metrics,” Electronics, vol. 10, no. 5, p. 593, 2021, https://doi.org/10.3390/electronics10050593

[23] M. I. Prasetiyowati, N. U. Maulidevi, and K. Surendro, “Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest,” Journal of Big Data, vol. 8, no. 1, p. 84, 2021.

[24] A. Alsahaf, A. A. Bakar, and Z. A. Othman, “A framework for feature selection through boosting,” Expert Syst. Appl., vol. 189, p. 116140, 2022.

[25] M. I. Prasetiyowati, N. U. Maulidevi, and K. Surendro, “Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest,” Journal of Big Data, vol. 8, no. 84, pp. 1–24, 2021.

[26] C. Arnold, “The role of hyperparameters in machine learning models and how to tune them,” Political Science Research and Methods, vol. 12, no. 4, pp. 841–848, 2024, doi: 10.1017/psrm.2023.61.

[27] D. V. Ramadhanti, “Perbandingan SMOTE dan ADASYN pada data imbalance,” Jurnal Gaussian, vol. 11, no. 4, pp. 503–510, 2022.

Downloads

Published

2025-12-15

How to Cite

[1]
N. A. Ilmi and N. A. S. Winarsih, “Stacking of DT, RF, and Gradient Boosting Algorithms for Classification of Building Damage Due to Earthquakes”, JAIC, vol. 9, no. 6, pp. 3853–3861, Dec. 2025.

Most read articles by the same author(s)

Similar Articles

1 2 3 4 > >> 

You may also start an advanced similarity search for this article.