A Comparative Study of Hyperparameter Optimization for CatBoost: Random Search, Optuna, and Successive Halving
DOI:
https://doi.org/10.30871/jaic.v10i2.11831Keywords:
CatBoost, Hyperparameter Optimization, Successive Halving, Random Search, OptunaAbstract
This study aims to evaluate the effectiveness of three hyperparameter optimization approaches Random Search, Successive Halving, and Optuna in the CatBoost algorithm for modeling individual income using the 2024 SAKERNAS data. Model performance was assessed using RMSE, MAE, and R-squared, complemented by a significance test based on 10,000 bootstrap resamples to ensure that performance differences were not driven by random variation. The results indicate that Optuna yields the most accurate predictive performance, followed by Successive Halving and Random Search. The RMSE values, which range from several hundred thousand to approximately one million rupiah, are consistent with the characteristics of the income variable, which is measured in rupiah and exhibits a heavy-tailed distribution. The feature importance analysis reveals a generally consistent ranking structure across methods, although moderate variation is observed for several features. These findings confirm that Optuna is the most effective tuning strategy, while Successive Halving serves as an efficient alternative for large-scale datasets. Overall, this study highlights the critical role of optimization strategies in improving predictive performance without compromising interpretability stability, making it particularly relevant for analytical applications in micro-level socio-economic data.
Downloads
References
[1] B. Bischl et al., “Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 2021, doi: 10.1002/widm.1484.
[2] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A Next-generation Hyperparameter Optimization Framework,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pp. 2623–2631, 2019, doi: 10.1145/3292500.3330701.
[3] J. T. Hancock and T. M. Khoshgoftaar, “CatBoost for big data: an interdisciplinary review,” J. Big Data, vol. 7, no. 94, 2020, doi: 10.1186/s40537-020-00369-8.
[4] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “Catboost: Unbiased boosting with categorical features,” arXiv, 2019.
[5] F. W. Hartono, Muljono, and A. Z. Fanani, “Improving the Accuracy of House Price Prediction using Catboost Regression with Random Search Hyperparameter Tuning: A Comparative Analysis,” Adv. Sustain. Sci. Eng. Technol., vol. 6, no. 3, 2024, doi: 10.26877/asset.v6i3.602.
[6] S. Jeganathan, A. R. Lakshminarayanan, S. Parthasarathy, A. A. A. Khan, and K. J. Sathick, “OptCatB: Optuna Hyperparameter Optimization Model to Forecast the Educational Proficiency of Immigrant Students based on Cat Boost Regression,” J. Internet Serv. Inf. Secur., vol. 14, no. 2, pp. 111–132, 2024, doi: 10.58346/JISIS.2024.I2.008.
[7] L. B. Klebanov, Y. V. Kuvaeva-Gudoshnikova, and S. T. Rachev, “Heavy-Tailed Probability Distributions: Some Examples of Their Appearance,” Mathematics, vol. 11, no. 14, pp. 1–7, 2023, doi: 10.3390/math11143094.
[8] S. Karlsson, S. Mazur, and H. Nguyen, “Vector autoregression models with skewness and heavy tails,” J. Econ. Dyn. Control, vol. 146, 2023, doi: 10.1016/j.jedc.2022.104580.
[9] K. Ouédraogo and D. Barro, “An Approach of Estimating the Value at Risk of Heavy-tailed Distribution using Copulas,” Eur. J. Pure Appl. Math., vol. 15, no. 4, pp. 2074–2085, 2022, doi: 10.29020/nybg.ejpam.v15i4.4280.
[10] A. V. Dorogush, V. Ershov, and A. Gulin, “CatBoost: gradient boosting with categorical features support,” arXiv, pp. 1–7, 2018, [Online]. Available: http://arxiv.org/abs/1810.11363
[11] X. Wei, C. Rao, X. Xiao, L. Chen, and M. Goh, “Risk assessment of cardiovascular disease based on SOLSSA-CatBoost model,” Expert Syst. Appl., vol. 219, no. January, p. 119648, 2023, doi: 10.1016/j.eswa.2023.119648.
[12] D. Micci-Barreca, “A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems,” SIGKDD Explor., vol. 3, no. 1, pp. 27–32, 2001, doi: 10.1145/507533.507538.
[13] G. Huang et al., “Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions,” J. Hydrol., vol. 574, no. April, pp. 1029–1041, 2019, doi: 10.1016/j.jhydrol.2019.04.085.
[14] A. R. M. Rom, N. Jamil, and S. Ibrahim, “Multi objective hyperparameter tuning via random search on deep learning models,” Telkomnika (Telecommunication Comput. Electron. Control., vol. 22, no. 4, pp. 956–968, 2024, doi: 10.12928/TELKOMNIKA.v22i4.25847.
[15] J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” J. Mach. Learn. Res., vol. 13, pp. 281–305, 2012.
[16] L. Villalobos-Arias, C. Quesada-López, J. Guevara-Coto, A. Martínez, and M. Jenkins, “Evaluating hyper-parameter tuning using random search in support vector machines for software effort estimation,” PROMISE 2020 - Proc. 16th ACM Int. Conf. Predict. Model. Data Anal. Softw. Eng. Co-located with ESEC/FSE 2020, 2020, doi: 10.1145/3416508.3417121.
[17] R. Aschauer, “Predictive Modeling of Next Product to Buy in the Banking Sector Using Boosting Techniques,” no. June, 2010.
[18] M. Ali, M. S. Azam, and T. Shahzad, “Random Search-Based Parameter Optimization on Binary Classifiers for Software Defect Prediction,” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 10, no. 2, pp. 476–488, 2024, doi: 10.26555/jiteki.v10i2.28973.
[19] H. Yang, Z. Tian, X. Li, and H. Liu, “An Antenna Optimization Method Based on Optuna-ANN,” 2023 IEEE 11th Asia-Pacific Conf. Antennas Propagation, APCAP 2023 - Proc., vol. volume1, pp. 1–3, 2023, doi: 10.1109/APCAP59480.2023.10469687.
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, and B. Thirion, “Scikit-learn: Machine Learning in Python Fabian,” J. ofMachine Learn. Res., vol. 12, pp. 2826–2830, 2011, doi: 10.4018/978-1-5225-9902-9.ch008.
[21] Z. Karnin, T. Koren, and O. Somekh, “Almost optimal exploration in multi-armed bandits,” 30th Int. Conf. Mach. Learn. ICML 2013, vol. 28, no. PART 3, pp. 2275–2283, 2013.
[22] K. Jamieson and A. Talwalkar, “Non-stochastic best arm identification and hyperparameter optimization,” Proc. 19th Int. Conf. Artif. Intell. Stat. AISTATS 2016, 2015.
[23] D. S. Soper, “Hyperparameter Optimization Using Successive Halving with Greedy Cross Validation,” Algorithms, vol. 16, no. 1, 2023, doi: 10.3390/a16010017.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Claudian Tikulimbong Tangdilomban, Kusman Sadik, Aji Hamim Wigena

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








