Application of Feature Selection and Comparative Analysis of Machine Learning Models for Rainfall Prediction in Jakarta
DOI:
https://doi.org/10.30871/jaic.v9i5.11000Keywords:
Classification, Machine learning, Prediction, Feature selectionAbstract
Accurate rainfall prediction plays a vital role in reducing disaster risks and supporting public preparedness, particularly in Jakarta where dense population and frequent floods cause serious economic and social impacts. In this study, weather data from the Kemayoran Meteorological Station covering 2004–2023 were analyzed to build rainfall prediction models using machine learning. Three classification algorithms were compared: Logistic Regression, Decision Tree, and Random Forest, selected to represent linear, non-linear, and ensemble approaches. Feature selection was applied using Recursive Feature Elimination (RFE) to identify the most relevant predictors. The models were evaluated using 5-fold cross-validation with metrics including Accuracy, Precision, Recall, F1 Score, ROC AUC, and Cohen’s Kappa. The results indicate that Random Forest achieved the best overall performance with Accuracy of 0.7622, Precision around 0.70, Recall up to 0.63, F1 Score about 0.65, ROC AUC ranging from 0.8044 to 0.8171, and Cohen’s Kappa near 0.48. Logistic Regression also performed competitively with Accuracy of 0.7648, ROC AUC of 0.829, and Kappa of 0.49, while Decision Tree showed lower results with Accuracy of 0.6890 and ROC AUC of 0.6636. The RFE process successfully reduced 18 meteorological attributes to 5 influential features, mainly temperature and relative humidity, which were dominant in distinguishing rainfall events. These findings demonstrate that both Random Forest and Logistic Regression outperform Decision Tree, and Random Forest with RFE can be recommended as the most robust model for rainfall prediction in Jakarta.
Downloads
References
[1] R. Prasetya, “Penerapan Teknik Data Mining Dengan Algoritma Classification Tree untuk Prediksi Hujan,” J. Widya Climago, vol. 2, no. 2, pp. 13–23, 2020.
[2] I. P. Putri, T. Terttiaavini, and N. Arminarahmah, “Analisis Perbandingan Algoritma Machine Learning untuk Prediksi Stunting pada Anak,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 1, pp. 257–265, 2024, doi: 10.57152/malcom.v4i1.1078.
[3] R. Herwanto, R. F. Purnomo, and S. Sriyanto, “Rainfall Prediction Using Data Mining Techniques - A Survey,” 3rd Int. Conf. Inf. Technol. Bus., pp. 188–193, 2017, doi: 10.5121/csit.2013.3903.
[4] T. T. Le, B. T. Pham, H. B. Ly, A. Shirzadi, and L. M. Le, “Development of 48-hour precipitation forecasting model using nonlinear autoregressive neural network,” Lect. Notes Civ. Eng., vol. 54, pp. 1191–1196, 2020, doi: 10.1007/978-981-15-0802-8_191.
[5] G. de Colombia, “Smart Cities - SMART CITIES,” Res. Gate, pp. 1–51, 2017, [Online]. Available: https://bibliotecadigital.fgv.br/dspace/handle/10438/18386
[6] Badan Pusat Statistik, “Statistik Daerah Provinsi DKI Jakarta 2023,” BPS Provinsi DKI Jakarta. [Online]. Available: https://jakarta.bps.go.id/publication
[7] BNPB, “Laporan Tahunan Penanggulangan Bencana 2023,” Badan Nasional Penanggulangan Bencana. [Online]. Available: https://bnpb.go.id/publikasi
[8] Y. O. Izadkhah and L. Gibbs, “A study of preschoolers’ perceptions of earthquakes through drawing,” Int. J. Disaster Risk Reduct., vol. 14, pp. 132–139, 2015, doi: 10.1016/j.ijdrr.2015.06.002.
[9] L. Mastronardi and A. Cavallo, “The spatial dimension of income inequality: An analysis at municipal level,” Sustain., vol. 12, no. 4, pp. 1–18, 2020, doi: 10.3390/su12041622.
[10] I. D. Sulistyowati, S. Sunarno, and D. Djuniadi, “Penerapan Machine Learning Dengan Algoritma Support Vector Machine Untuk Prediksi Kelembapan Udara Rata-Rata,” Just IT J. Sist. Informasi, Teknol. Inf. dan Komput., vol. 15, no. 1, pp. 284–290, 2024.
[11] S. Chen et al., “Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach,” Comput. Secur., vol. 73, no. Bo Li, pp. 326–344, 2018, doi: 10.1016/j.cose.2017.11.007.
[12] A. Sutaryani, S. Sunarno, and D. Djuniadi, “Perbandingan Performa Model Machine Learning dalam Prediksi Suhu di Semarang,” JITET (Jurnal Inform. dan Tek. Elektro Ter., vol. 12, no. 3, pp. 2770–2775, 2024.
[13] P. G. Zhang, “Time series forecasting using a hybrid ARIMA and neural network model,” Neurocomputing, vol. 50, pp. 159–175, 2003, doi: 10.1016/S0925-2312(01)00702-0.
[14] A. Y. Barrera-Animas, L. O. Oyedele, M. Bilal, T. D. Akinosho, J. M. D. Delgado, and L. A. Akanbi, “Rainfall prediction: A comparative analysis of modern machine learning algorithms for time-series forecasting,” Mach. Learn. with Appl., vol. 7, no. August 2021, p. 100204, 2022, doi: 10.1016/j.mlwa.2021.100204.
[15] M. Usman Saeed Khan, K. Mohammad Saifullah, A. Hussain, and H. Mohammad Azamathulla, “Comparative analysis of different rainfall prediction models: A case study of Aligarh City, India,” Results Eng., vol. 22, no. January, p. 102093, 2024, doi: 10.1016/j.rineng.2024.102093.
[16] P. K. Das, R. L. Sahu, and P. C. Swain, “Comparative analysis of machine learning models for rainfall prediction,” J. Atmos. Solar-Terrestrial Phys., vol. 264, no. August, p. 106340, 2024, doi: 10.1016/j.jastp.2024.106340.
[17] G. Fibarkah, M. A. Tondang, N. W. Yulistyaningrum, and M. Afrad, “Prediksi Curah Hujan di Kabupaten Rembang dengan Model Random Forest,” no. Ml, pp. 863–871, 2024.
[18] A. R. I. Pratama, S. A. Latipah, and B. N. Sari, “Optimasi Klasifikasi Curah Hujan Menggunakan Support Vector Machine (Svm) Dan Recursive Feature Elimination (Rfe),” JIPI (Jurnal Ilm. Penelit. dan Pembelajaran Inform., vol. 7, no. 2, pp. 314–324, 2022, doi: 10.29100/jipi.v7i2.2675.
[19] E. S. Wahyuni, “Penerapan Metode Seleksi Fitur Untuk Meningkatkan Hasil Diagnosis Kanker Payudara,” Simetris J. Tek. Mesin, Elektro dan Ilmu Komput., vol. 7, no. 1, p. 283, 2016, doi: 10.24176/simet.v7i1.516.
[20] F. M. Herza, B. Rahmat, M. Muharrom, and A. L. Haromainy, “Pengaruh Rfe Terhadap Logistic Regression Dan Support Vector Machine Pada Analisis Sentimen Hotel Shangri-La Surabaya,” vol. 8, no. 6, pp. 11612–11619, 2024.
[21] I. Sutoyo, “Implementasi Algoritma Decision Tree Untuk Klasifikasi Data Peserta Didik,” J. Pilar Nusa Mandiri, vol. 14, no. 2, p. 217, 2018, doi: 10.33480/pilar.v14i2.926.
[22] D. P. Sinambela, H. Naparin, M. Zulfadhilah, and N. Hidayah, “Implementasi Algoritma Decision Tree dan Random Forest dalam Prediksi Perdarahan Pascasalin,” J. Inf. dan Teknol., vol. 5, no. 3, pp. 58–64, 2023, doi: 10.60083/jidt.v5i3.393.
[23] D. Kusrini, Dwi Endah; Puspitasari, “Penggunaan Analisis Regresi Logistik Untuk Menganalisis Perilaku Dan Faktor-Faktor Yang Mempengaruhi Minat Baca Pengunjung Badan Perpustakaan Propinsi Jawa Timur,” J. Mat., vol. Vol.9 No.1, pp. 149–155, 2006.
[24] Bertalya, Prihandoko, L. Setyowati, F. I. Irawan, and S. R. Irlianti, “Formulation of city health development index using data mining,” Indones. J. Electr. Eng. Comput. Sci., vol. 23, no. 1, pp. 362–369, 2021, doi: 10.11591/ijeecs.v23.i1.pp362-369.
[25] M. S. Wibawa and K. D. P. Novianti, “Reduksi Fitur untuk Optimalisasi Klasifikasi Tumor Payudara Berdasarkan Data Citra FNA,” Konf. Nas. Sist. Inform., pp. 73–78, 2017.
[26] M. S. Wibawa, H. A. Nugroho, and N. A. Setiawan, “Performance evaluation of combined feature selection and classification methods in diagnosing Parkinson disease based on voice feature,” Proc. - 2015 Int. Conf. Sci. Inf. Technol. Big Data Spectr. Futur. Inf. Econ. ICSITech 2015, no. July, pp. 126–131, 2016, doi: 10.1109/ICSITech.2015.7407790.
[27] A. Shiddiq, R. K. Niswatin, and I. N. Farida, “Ahmad Shiddiq Analisa Kepuasan Konsumen Menggunakan Klasifikasi Decision Tree Di Restoran Dapur Solo (Cabang Kediri),” Gener. J., vol. 2, no. 1, p. 9, 2018, doi: 10.29407/gj.v2i1.12051.
[28] A. Purwanto, “Jurnal Teknoinfo,” Tong Sampah Pint. Dengan Perintah Suara Guna Menghilangkan Perilaku Siswa Membuang Sampah Sembarangan Di Sekol., vol. 14, pp. 48–58, 2020, [Online]. Available: https://ejurnal.teknokrat.ac.id/index.php/teknoinfo/article/view/336/329
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Indah Dwi Sulistyowati, Sunarno Sunarno, Iqbal Iqbal, KGS M Nurs Syamsuri

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








