Enhancing the Predictive Accuracy of Corrosion Inhibition Efficiency Using Gradient Boosting with Feature Engineering and Gaussian Mixture Model

Authors

  • Sahrul Amri Universitas Dian Nuswantoro
  • Muhamad Akrom Universitas Dian Nuswantoro
  • Gustina Alfa Trisnapradika Universitas Dian Nuswantoro

DOI:

https://doi.org/10.30871/jaic.v9i6.11560

Keywords:

Corrosion Inhibition, Machine Learning, Gradient Boosting Regressor, Feature Engineering, Gaussian Mixture Model, Data Augmentation

Abstract

Prediction The development of Quantitative structure property relationship (QSPR) models for predicting corrosion inhibition efficiency (IE) often faces challenges due to small datasets, which heightens the risk of overfitting and results in less reliable performance assessments. This research creates an entirely leakage-free modeling framework by combining per-fold preprocessing, augmentation of training-only data, and rigorous Leave-One-Out Cross-Validation (LOOCV). A set of 20 pyridazine derivatives was evaluated using 12 quantum-chemical descriptors, including HOMO, LUMO, ΔE, dipole moment, electronegativity, hardness, softness, and the electron-transfer fraction. An initial assessment showed that all baseline models lacking augmentation Gradient Boosting, Random Forest, SVR, and XGBoost demonstrated limited predictive power (R² < 0.20), revealing the dataset's inherently low information complexity.To enhance representation in the feature space, a multi-scale Gaussian Mixture Model (GMM) was used to generate chemically valid synthetic samples, with all components trained solely on the training subset from each LOOCV fold. This strategy consistently improved model performance. The two most successful configurations, XGBoost + GMM v2 and Random Forest + GMM v3, reached R² values of 0.4457 and 0.4108, respectively, along with significant decreases in RMSE, MAE, and MAPE. These findings illustrate that GMM-based generative augmentation effectively captures multicluster structures within the descriptor space while expanding the chemical variability domain in a controlled way.While the resulting R² values remain inadequate for high-precision quantitative predictions, the proposed methodology provides a solid basis for early-stage evaluation of corrosion inhibitors in situations with limited data. Future research will aim to integrate advanced DFT-derived descriptors, molecular graph representations, and tests against larger external datasets to enhance model generalizability.

Downloads

Download data is not yet available.

References

[1] A. N, D. R, and R. S, “Curcumin and Curcumin Derivatives as Green Corrosion Inhabitor-A Review,” Phys. Chem. Res., vol. 11, no. 4, Dec. 2023, doi: 10.22036/pcr.2022.362856.2199.

[2] G. N. Sajida, G. M. Krista, H. K. Sari, T. Taufiqurohim, Y. F. Ferawati, and R. P. Sihombing, “Potensi Ekstrak Kunyit sebagai Inhibitor Korosi Ramah Lingkungan untuk Baja Karbon Rendah,” J. Teknol., vol. 25, no. 2, 2025, doi: http://dx.doi.org/10.30811/teknologi.v25i2.7483.

[3] M. Akrom, “INVESTIGATION OF NATURAL EXTRACTS AS GREEN CORROSION INHIBITORS IN STEEL USING DENSITY FUNCTIONAL THEORY,” J. Teori Dan Apl. Fis., vol. 10, no. 1, p. 89, Jan. 2022, doi: 10.23960/jtaf.v10i1.2927.

[4] M. Akrom, “Experimental Investigation of Natural Plant Extracts as A Green Corrosion Inhibitor in Steel,” J. Renew. Energy Mech., vol. 5, no. 01, pp. 1–15, Feb. 2022, doi: 10.25299/rem.2022.8887.

[5] M. Akrom and T. Sutojo, “Investigasi Model Machine Learning Berbasis QSPR pada Inhibitor Korosi Pirimidin,” Eksergi, vol. 20, no. 2, p. 107, July 2023, doi: 10.31315/e.v20i2.9864.

[6] V. F. Adiprasetya, M. Akrom, and G. A. Trisnapradika, “Investigasi Efisiensi Penghambatan Korosi Senyawa Quinoxaline Berbasis Machine Learning,” Eksergi, vol. 21, no. 2, p. 65, Mar. 2024, doi: 10.31315/e.v21i2.10025.

[7] J. F. Fatriansyah et al., “A machine learning framework for screening phenyl phthalimide derivatives as corrosion inhibitors based on dataset generated by DFT and molecular dynamics simulations,” Results Eng., vol. 28, p. 107350, Dec. 2025, doi: 10.1016/j.rineng.2025.107350.

[8] T. H. Pham, P. K. Le, and D. N. Son, “A data-driven QSPR model for screening organic corrosion inhibitors for carbon steel using machine learning techniques,” RSC Adv., vol. 14, no. 16, pp. 11157–11168, 2024, doi: 10.1039/D4RA02159B.

[9] N. U. S. Riyaz, M. Khaled, A. Alshami, and I. A. Hussein, “Machine Learning-Driven Prediction of Corrosion Inhibitor Efficiency: Emerging Algorithms, Challenges, and Future Outlooks,” Arab. J. Sci. Eng., July 2025, doi: 10.1007/s13369-025-10386-5.

[10] F. M. Haikal, M. Akrom, and G. A. Trisnapradika, “Perbandingan Algoritma Multilinear Regression dan Decision Tree Regressor dalam Memprediksi Efisiensi Penghambatan Korosi Piridazin,” Edumatic J. Pendidik. Inform., vol. 7, no. 2, pp. 307–315, Dec. 2023, doi: 10.29408/edumatic.v7i2.22127.

[11] M. Fadil, M. Akrom, and W. Herowati, “Utilization of Machine Learning for Predicting Corrosion Inhibition by Quinoxaline Compounds,” J. Appl. Inform. Comput., vol. 9, no. 1, pp. 173–177, Jan. 2025, doi: 10.30871/jaic.v9i1.8894.

[12] S. Ramaneswaran, K. Srinivasan, P. M. D. R. Vincent, and C.-Y. Chang, “Hybrid Inception v3 XGBoost Model for Acute Lymphoblastic Leukemia Classification,” Comput. Math. Methods Med., vol. 2021, pp. 1–10, July 2021, doi: 10.1155/2021/2577375.

[13] W. Herowati et al., “Prediction of Corrosion Inhibition Efficiency Based on Machine Learning for Pyrimidine Compounds: A Comparative Study of Linear and Non-linear Algorithms,” KnE Eng., Mar. 2024, doi: 10.18502/keg.v6i1.15350.

[14] E. S. Budi, A. N. Chan, P. P. Alda, and M. A. F. Idris, “Optimasi Model Machine Learning untuk Klasifikasi dan Prediksi Citra Menggunakan Algoritma Convolutional Neural Network,” vol. 4, no. 5, 2024.

[15] L. W. Rizkallah, “Enhancing the performance of gradient boosting trees on regression problems,” J. Big Data, vol. 12, no. 1, p. 35, Feb. 2025, doi: 10.1186/s40537-025-01071-3.

[16] D. R. Ningtias and M. Akrom, “XGBoost performance in predicting corrosion inhibition efficiency of Benzimidazole Compounds,” J. Multiscale Mater. Inform., vol. 1, no. 2, pp. 9–13, July 2024, doi: 10.62411/jimat.v1i2.11021.

[17] G. A. Trisnapradika, U. D. Nuswantoro, and M. Akrom, “A Machine Learning Approach for Forecasting the E cacy of Pyridazine Corrosion Inhibitors”.

[18] I. P. Aldiansah and M. Akrom, “Effect of Virtual Sample Generation in Predicting Corrosion Inhibition Efficiency on Pyridazine,” vol. 9, no. 2, doi: https://doi.org/10.30871/jaic.v9i2.9131.

[19] N. Ariyanto, H. A. Azies, and M. Akrom, “Ensemble Stacking of Machine Learning Approach for Predicting Corrosion Inhibitor Performance of Pyridazine Compounds,” Int. J. Adv. Data Inf. Syst., vol. 5, no. 2, Nov. 2024, doi: 10.59395/ijadis.v5i2.1346.

[20] C. Wang, T. Shi, and D. Han, “Adaptive Dimensional Gaussian Mutation of PSO-Optimized Convolutional Neural Network Hyperparameters,” Appl. Sci., vol. 13, no. 7, p. 4254, Mar. 2023, doi: 10.3390/app13074254.

[21] L. Rosiana and I. Yuadi, “K-Means Clustering untuk Analisis Tren Peminjaman Buku di Perpustakaan,” J. Technol. Inform. JoTI, vol. 7, no. 1, pp. 1–10, Apr. 2025, doi: 10.37802/joti.v7i1.933.

[22] D. Ignasius, M. Akrom, and S. Budi, “Comparative Analysis of Linear Regression, Decision Tree, and Gradient Boosting Models for Predicting Drug Corrosion Inhibition Efficiency Using QSAR Descriptors,” Fakt. Exacta, vol. 17, no. 3, p. 251, Sept. 2024, doi: 10.30998/faktorexacta.v17i3.24679.

Downloads

Published

2025-12-15

How to Cite

[1]
S. Amri, M. Akrom, and G. A. Trisnapradika, “Enhancing the Predictive Accuracy of Corrosion Inhibition Efficiency Using Gradient Boosting with Feature Engineering and Gaussian Mixture Model”, JAIC, vol. 9, no. 6, pp. 3840–3852, Dec. 2025.

Similar Articles

<< < 52 53 54 

You may also start an advanced similarity search for this article.