Performance of Multivariate Missing Data Imputation Methods on Climate Data
DOI:
https://doi.org/10.30871/jaic.v9i6.11316Abstract
Climate data plays an important role in various aspects of life. However, missing data is often found, which can interfere with data processing and reduce the quality of analysis. Therefore, appropriate handling methods are needed to ensure that the analysis results remain valid. This study aims to compare the performance of several imputation methods for missing multivariate data based on the identification of actual missing data patterns, and to determine the appropriate imputation method based on the mechanism of missing data. This study also aims to apply the best method to data with actual missing data patterns to assess its effect on descriptive statistical changes required for further climatological analysis. The methods used include monthly averages, missRanger, k-Nearest Neighbor (k-NN), and Iterative Robust-Model Imputation (IRMI). The missing data information was obtained from Global Surface Summary of the Day (GSOD) data, namely temperature, precipitation, humidity, pressure, and wind speed variables with a daily frequency for 11 years, with a missing data proportion of 11.4%. The missing data patterns were then applied to relatively complete NASA Power data to evaluate the imputation results. The results show that IRMI is less capable of handling extreme missing data conditions, namely 17 completely missing rows. In contrast, k-NN, missRanger, and monthly averages provided better results in both extreme and non-extreme conditions. Of the four methods, monthly averages were chosen because they were able to overcome missing data while maintaining multivariate structure with 58% on sMAPE and 2.64% on relative difference.
Downloads
References
[1] S. R. Wicaksono, Prinsip Dasar Kualitas Data. Malang: Seribu Bintang, 2023. doi: 10.5281/zenodo.12155308.
[2] F. Rafii and T. Kechadi, “Collection of historical weather data: Issues with missing values,” ACM Int. Conf. Proceeding Ser., no. 365, 2019, doi: 10.1145/3368756.3368974.
[3] A. Little and B. Rubin, Analysis with missing, 3rd ed. Hokoben: Wiley, 2020.
[4] G. Gunawan, “Analisis data hidrologi sungai air bengkulu menggunakan metode statistik,” J. Inersia, vol. 9, no. 1, pp. 47–58, 2017.
[5] C. Ocampo-marulanda et al., “Missing data estimation in extreme rainfall indices for the Metropolitan area of Cali - Colombia : An approach based on artificial neural networks,” Data Br., vol. 39, p. 107592, 2021, doi: 10.1016/j.dib.2021.107592.
[6] B. Gomer, “Mcar, mar, and mnar values in the same dataset: a realistic evaluation of methods for handling missing data,” Multivariate Behav. Res., vol. 54, no. 1, p. 153, 2019, doi: 10.1080/00273171.2018.1557033.
[7] P. Saeipourdizaj, P. Sarbakhsh, and A. Gholampour, “Application of imputation methods for missing values of pm10 and o3 data: interpolation, moving average and k-nearest neighbor methods,” Environ. Heal. Eng. Manag., vol. 8, no. 3, pp. 215–226, 2021, doi: 10.34172/EHEM.2021.25.
[8] M. Templ and M. Ulmer, “The impact of misclassifications and outliers on imputation methods,” J. Appl. Stat., vol. 51, no. 14, pp. 2894–2928, 2024, doi: 10.1080/02664763.2024.2325969.
[9] K. Gurtskaia, J. Schwerter, and P. Doebler, “Adapting tree-based multiple imputation methods for multi-level data ? A simulation study,” arXiv Prepr., vol., no., p., 2024, doi: 10.48550.
[10] J. Han and M. Kamber, Data Mining: Concepts and Techniques. San Diego: Morgan Kaupmann, 2001.
[11] M. Mayer, “Package ‘ missRanger ,’” pp. 1–10, 2024, doi: 10.1093/bioinformatics/btr597>.
[12] M. N. Wright and A. Ziegler, “Ranger: A fast implementation of random forests for high dimensional data in C++ and R,” J. Stat. Softw., vol. 77, no. 1, pp. 1–17, 2017, doi: 10.18637/jss.v077.i01.
[13] J. Schwerter, K. Gurtskaia, A. Romero, B. Zeyer-Gliozzo, and M. Pauly, “Evaluating tree-based imputation methods as an alternative to MICE PMM for drawing inference in empirical studies,” arXiv Prepr., vol., p., 2024.
[14] Y. S. Resheff and D. Weinshall, “Optimized linear imputation,” in 6th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2017), Setúbal: SCITEPRESS, 2017, pp. 17–25. doi: 10.5220/0006092900170025.
[15] M. Templ, “Enhancing precision in large-scale data analysis: an innovative robust imputation algorithm for managing outliers and missing values,” Mathematics, vol. 11, no. 12, 2023, doi: 10.3390/math11122729.
[16] M. Templ, A. Kowarik, and P. Filzmoser, “Iterative stepwise regression imputation using standard and robust methods,” Comput. Stat. Data Anal., vol. 55, no. 10, pp. 2793–2806, 2011, doi: 10.1016/j.csda.2011.04.012.
[17] C. Li, “Little’s test of missing completely at random,” Stata J., vol. 13, no. 4, pp. 795–809, 2013, doi: 10.1177/1536867x1301300407.
[18] M. W. Heymans and J. W. R. Twisk, “Handling missing data in clinical research,” J. Clin. Epidemiol., vol. 151, pp. 185–188, 2022, doi: 10.1016/j.jclinepi.2022.08.016.
[19] M. Jamshidian and S. Jalal, “Tests of homoscedasticity, normality, and missing completely at random for incomplete multivariate data,” Psychometrika, vol. 75, no. 4, pp. 649–674, 2010, doi: 10.1007/s11336-010-9175-3.
[20] P. McKnight, K. McKnight, S. Sidani, and A. Figueredo, Missing Data: A Gentle Introduction. New York City: The Guilford Press, 2007.
[21] C. K. Enders, Missing Applied Analysis Data. New York City: The Guilford Press, 2010.
[22] M. Martinkova, “Overview of observed clausius-clapeyron scaling of extreme precipitation in midlatitudes,” Atmosphere (Basel)., vol. 11, pp. 1–16, 2020, doi: 10.3390/atmos11080786.
[23] C. Xu et al., “Asian-Australian summer monsoons linkage to ENSO strengthened by global warming,” npj Clim. Atmos. Sci., vol. 6, no. 1, 2023, doi: 10.1038/s41612-023-00341-2.
[24] H. Aguilera, C. Guardiola-Albert, and C. Serrano-Hidalgo, “Estimating extremely large amounts of missing precipitation data,” J. Hydroinformatics, vol. 22, no. 3, pp. 578–592, 2020, doi: 10.2166/hydro.2020.127.
[25] Y. Zhou, S. Aryal, and M. R. Bouadjenek, “A comprehensive review of handling missing data: exploring special missing mechanisms,” 2024.
[26] A. J. Mason, R. D. Grieve, A. Richards-belle, P. R. Mouncey, D. A. Harrison, and J. R. Carpenter, “Open Access A framework for extending trial design to facilitate missing data sensitivity analyses,” BMC Med. Res. Methodol., vol. 2, pp. 1–12, 2020, doi: 10.1186/s12874-020-00930-2.
[27] D. M. P. Murti, U. Pujianto, A. P. Wibawa, and M. I. Akbar, “K-nearest neighbor (K-NN) based missing data imputation,” in 5th International Conference on Science in Information Technology (ICSITech), 2019, pp. 83–88. doi: https://doi.org/10.1109/icsitech46713.2019.8987530.
[28] N. Umar and A. Gray, “Optimal parameter choice for imputing missing values in water level data using the k-nearest neighbour (kNN) method.,” in The Doctoral School Multidisciplinary Symposium (DSMS 2023), Glasglow, United Kingdom, 2023, pp. 1–2.
[29] H. Manlea, Klimatologi Dasar. Jakarta: PT Literasi Nusantara Abadi Group, 2020.
[30] [BMKG], “Indonesia Typical Meteorogical Year,” Badan Meteorologi, Klimatologi, dan Geofisika. Accessed: Nov. 24, 2025. [Online]. Available: https://iklim.bmkg.go.id/id/i-tmy/
[31] C. Martinez-Villalobos and J. D. Neelin, “Why Do Precipitation Intensities Tend to Follow Gamma Distributions ?,” J. Atmos. Sci., vol. 76, no. 1, pp. 3611–3631, 2019, doi: 10.1175/JAS-D-18-0343.1.
[32] C. Guilloteau, A. Mamalakis, L. Vulis, P. V. V. Le, T. T. Georgiou, and E. Foufoula-Georgiou, “Rotated spectral principal component cnalysis ( rsPCA ) for identifying dynamical codes of variability in climate cystems,” J. Clim., vol. 34, pp. 715–736, 2021, doi: 10.1175/JCLI-D-20-0266.1.
[33] G. Sottile, A. Francipane, G. Adelfino, and L. V. Noto, “A PCA-based clustering algorithm for the identification of stratiform and convective precipitation at the event scale : an application to the sub-hourly precipitation of Sicily , Italy,” Stoch. Environ. Res. Risk Assess., vol. 36, no. 8, pp. 2303–2317, 2022, doi: 10.1007/s00477-021-02028-7.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Amalia Safira Widyawati, Anwar Fitrianto, Pika Silvianti

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








