Application of Feature Selection and Comparative Analysis of Machine Learning Models for Rainfall Prediction in Jakarta

Authors

  • Indah Dwi Sulistyowati Universitas Negeri Semarang
  • Sunarno Sunarno Universitas Negeri Semarang
  • Iqbal Iqbal Badan Meteorologi Klimatologi dan Geofisika
  • KGS M Nurs Syamsuri Badan Meteorologi Klimatologi dan Geofisika

DOI:

https://doi.org/10.30871/jaic.v9i5.11000

Keywords:

Classification, Machine learning, Prediction, Feature selection

Abstract

Accurate rainfall prediction plays a vital role in reducing disaster risks and supporting public preparedness, particularly in Jakarta where dense population and frequent floods cause serious economic and social impacts. In this study, weather data from the Kemayoran Meteorological Station covering 2004–2023 were analyzed to build rainfall prediction models using machine learning. Three classification algorithms were compared: Logistic Regression, Decision Tree, and Random Forest, selected to represent linear, non-linear, and ensemble approaches. Feature selection was applied using Recursive Feature Elimination (RFE) to identify the most relevant predictors. The models were evaluated using 5-fold cross-validation with metrics including Accuracy, Precision, Recall, F1 Score, ROC AUC, and Cohen’s Kappa. The results indicate that Random Forest achieved the best overall performance with Accuracy of 0.7622, Precision around 0.70, Recall up to 0.63, F1 Score about 0.65, ROC AUC ranging from 0.8044 to 0.8171, and Cohen’s Kappa near 0.48. Logistic Regression also performed competitively with Accuracy of 0.7648, ROC AUC of 0.829, and Kappa of 0.49, while Decision Tree showed lower results with Accuracy of 0.6890 and ROC AUC of 0.6636. The RFE process successfully reduced 18 meteorological attributes to 5 influential features, mainly temperature and relative humidity, which were dominant in distinguishing rainfall events. These findings demonstrate that both Random Forest and Logistic Regression outperform Decision Tree, and Random Forest with RFE can be recommended as the most robust model for rainfall prediction in Jakarta.

Downloads

Download data is not yet available.

References

[1] R. Prasetya, “Penerapan Teknik Data Mining Dengan Algoritma Classification Tree untuk Prediksi Hujan,” J. Widya Climago, vol. 2, no. 2, pp. 13–23, 2020.

[2] I. P. Putri, T. Terttiaavini, and N. Arminarahmah, “Analisis Perbandingan Algoritma Machine Learning untuk Prediksi Stunting pada Anak,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 1, pp. 257–265, 2024, doi: 10.57152/malcom.v4i1.1078.

[3] R. Herwanto, R. F. Purnomo, and S. Sriyanto, “Rainfall Prediction Using Data Mining Techniques - A Survey,” 3rd Int. Conf. Inf. Technol. Bus., pp. 188–193, 2017, doi: 10.5121/csit.2013.3903.

[4] T. T. Le, B. T. Pham, H. B. Ly, A. Shirzadi, and L. M. Le, “Development of 48-hour precipitation forecasting model using nonlinear autoregressive neural network,” Lect. Notes Civ. Eng., vol. 54, pp. 1191–1196, 2020, doi: 10.1007/978-981-15-0802-8_191.

[5] G. de Colombia, “Smart Cities - SMART CITIES,” Res. Gate, pp. 1–51, 2017, [Online]. Available: https://bibliotecadigital.fgv.br/dspace/handle/10438/18386

[6] Badan Pusat Statistik, “Statistik Daerah Provinsi DKI Jakarta 2023,” BPS Provinsi DKI Jakarta. [Online]. Available: https://jakarta.bps.go.id/publication

[7] BNPB, “Laporan Tahunan Penanggulangan Bencana 2023,” Badan Nasional Penanggulangan Bencana. [Online]. Available: https://bnpb.go.id/publikasi

[8] Y. O. Izadkhah and L. Gibbs, “A study of preschoolers’ perceptions of earthquakes through drawing,” Int. J. Disaster Risk Reduct., vol. 14, pp. 132–139, 2015, doi: 10.1016/j.ijdrr.2015.06.002.

[9] L. Mastronardi and A. Cavallo, “The spatial dimension of income inequality: An analysis at municipal level,” Sustain., vol. 12, no. 4, pp. 1–18, 2020, doi: 10.3390/su12041622.

[10] I. D. Sulistyowati, S. Sunarno, and D. Djuniadi, “Penerapan Machine Learning Dengan Algoritma Support Vector Machine Untuk Prediksi Kelembapan Udara Rata-Rata,” Just IT J. Sist. Informasi, Teknol. Inf. dan Komput., vol. 15, no. 1, pp. 284–290, 2024.

[11] S. Chen et al., “Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach,” Comput. Secur., vol. 73, no. Bo Li, pp. 326–344, 2018, doi: 10.1016/j.cose.2017.11.007.

[12] A. Sutaryani, S. Sunarno, and D. Djuniadi, “Perbandingan Performa Model Machine Learning dalam Prediksi Suhu di Semarang,” JITET (Jurnal Inform. dan Tek. Elektro Ter., vol. 12, no. 3, pp. 2770–2775, 2024.

[13] P. G. Zhang, “Time series forecasting using a hybrid ARIMA and neural network model,” Neurocomputing, vol. 50, pp. 159–175, 2003, doi: 10.1016/S0925-2312(01)00702-0.

[14] A. Y. Barrera-Animas, L. O. Oyedele, M. Bilal, T. D. Akinosho, J. M. D. Delgado, and L. A. Akanbi, “Rainfall prediction: A comparative analysis of modern machine learning algorithms for time-series forecasting,” Mach. Learn. with Appl., vol. 7, no. August 2021, p. 100204, 2022, doi: 10.1016/j.mlwa.2021.100204.

[15] M. Usman Saeed Khan, K. Mohammad Saifullah, A. Hussain, and H. Mohammad Azamathulla, “Comparative analysis of different rainfall prediction models: A case study of Aligarh City, India,” Results Eng., vol. 22, no. January, p. 102093, 2024, doi: 10.1016/j.rineng.2024.102093.

[16] P. K. Das, R. L. Sahu, and P. C. Swain, “Comparative analysis of machine learning models for rainfall prediction,” J. Atmos. Solar-Terrestrial Phys., vol. 264, no. August, p. 106340, 2024, doi: 10.1016/j.jastp.2024.106340.

[17] G. Fibarkah, M. A. Tondang, N. W. Yulistyaningrum, and M. Afrad, “Prediksi Curah Hujan di Kabupaten Rembang dengan Model Random Forest,” no. Ml, pp. 863–871, 2024.

[18] A. R. I. Pratama, S. A. Latipah, and B. N. Sari, “Optimasi Klasifikasi Curah Hujan Menggunakan Support Vector Machine (Svm) Dan Recursive Feature Elimination (Rfe),” JIPI (Jurnal Ilm. Penelit. dan Pembelajaran Inform., vol. 7, no. 2, pp. 314–324, 2022, doi: 10.29100/jipi.v7i2.2675.

[19] E. S. Wahyuni, “Penerapan Metode Seleksi Fitur Untuk Meningkatkan Hasil Diagnosis Kanker Payudara,” Simetris J. Tek. Mesin, Elektro dan Ilmu Komput., vol. 7, no. 1, p. 283, 2016, doi: 10.24176/simet.v7i1.516.

[20] F. M. Herza, B. Rahmat, M. Muharrom, and A. L. Haromainy, “Pengaruh Rfe Terhadap Logistic Regression Dan Support Vector Machine Pada Analisis Sentimen Hotel Shangri-La Surabaya,” vol. 8, no. 6, pp. 11612–11619, 2024.

[21] I. Sutoyo, “Implementasi Algoritma Decision Tree Untuk Klasifikasi Data Peserta Didik,” J. Pilar Nusa Mandiri, vol. 14, no. 2, p. 217, 2018, doi: 10.33480/pilar.v14i2.926.

[22] D. P. Sinambela, H. Naparin, M. Zulfadhilah, and N. Hidayah, “Implementasi Algoritma Decision Tree dan Random Forest dalam Prediksi Perdarahan Pascasalin,” J. Inf. dan Teknol., vol. 5, no. 3, pp. 58–64, 2023, doi: 10.60083/jidt.v5i3.393.

[23] D. Kusrini, Dwi Endah; Puspitasari, “Penggunaan Analisis Regresi Logistik Untuk Menganalisis Perilaku Dan Faktor-Faktor Yang Mempengaruhi Minat Baca Pengunjung Badan Perpustakaan Propinsi Jawa Timur,” J. Mat., vol. Vol.9 No.1, pp. 149–155, 2006.

[24] Bertalya, Prihandoko, L. Setyowati, F. I. Irawan, and S. R. Irlianti, “Formulation of city health development index using data mining,” Indones. J. Electr. Eng. Comput. Sci., vol. 23, no. 1, pp. 362–369, 2021, doi: 10.11591/ijeecs.v23.i1.pp362-369.

[25] M. S. Wibawa and K. D. P. Novianti, “Reduksi Fitur untuk Optimalisasi Klasifikasi Tumor Payudara Berdasarkan Data Citra FNA,” Konf. Nas. Sist. Inform., pp. 73–78, 2017.

[26] M. S. Wibawa, H. A. Nugroho, and N. A. Setiawan, “Performance evaluation of combined feature selection and classification methods in diagnosing Parkinson disease based on voice feature,” Proc. - 2015 Int. Conf. Sci. Inf. Technol. Big Data Spectr. Futur. Inf. Econ. ICSITech 2015, no. July, pp. 126–131, 2016, doi: 10.1109/ICSITech.2015.7407790.

[27] A. Shiddiq, R. K. Niswatin, and I. N. Farida, “Ahmad Shiddiq Analisa Kepuasan Konsumen Menggunakan Klasifikasi Decision Tree Di Restoran Dapur Solo (Cabang Kediri),” Gener. J., vol. 2, no. 1, p. 9, 2018, doi: 10.29407/gj.v2i1.12051.

[28] A. Purwanto, “Jurnal Teknoinfo,” Tong Sampah Pint. Dengan Perintah Suara Guna Menghilangkan Perilaku Siswa Membuang Sampah Sembarangan Di Sekol., vol. 14, pp. 48–58, 2020, [Online]. Available: https://ejurnal.teknokrat.ac.id/index.php/teknoinfo/article/view/336/329

Downloads

Published

2025-10-08

How to Cite

[1]
I. D. Sulistyowati, S. Sunarno, I. Iqbal, and K. M. N. Syamsuri, “Application of Feature Selection and Comparative Analysis of Machine Learning Models for Rainfall Prediction in Jakarta”, JAIC, vol. 9, no. 5, pp. 2364–2370, Oct. 2025.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.