Evaluation of SMOTE Technique in the Comparison of XGBoost and Random Forest Algorithms for Liver Disease Prediction

Wahyutri Nur Rohman; I Made Artha Agastya

doi:10.30871/jaic.v9i6.10239

Authors

Wahyutri Nur Rohman Universitas Amikom Yogyakarta
I Made Artha Agastya Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.30871/jaic.v9i6.10239

Keywords:

Data Mining, Liver Disease Prediction, Random Forest, XGBoost, Model Evaluation

Abstract

In many countries, including Indonesia, liver disease remains a major cause of morbidity and mortality. Early detection plays a crucial role in improving treatment outcomes. This study evaluates the performance of two widely used machine learning models Random Forest and XGBoost for predicting liver disease, employing the SMOTE balancing technique to address class imbalance. The primary objectives are to enhance model fairness, reduce overfitting, and improve sensitivity toward the minority class. Model performance is assessed using accuracy, precision, recall, and F1-score metrics. The XGBoost model achieved an average accuracy of 99.74%, precision of 99.77%, recall of 99.75%, and F1-score of 99.72%, while the Random Forest model attained an average accuracy of 99.82%, precision of 99.89%, recall of 99.75%, and F1-score of 99.75%. Both models demonstrated excellent predictive capability, with Random Forest slightly outperforming XGBoost. These results highlight the importance of data balancing and robust model validation in developing reliable machine learning models for healthcare decision-making.

Downloads

Download data is not yet available.

References

[1] H. Devarbhavi, S. K. Asrani, J. P. Arab, Y. A. Nartey, E. Pose, and P. S. Kamath, “Global burden of liver disease: 2023 update,” J Hepatol, vol. 79, no. 2, pp. 516–537, Aug. 2023, doi: 10.1016/j.jhep.2023.03.017.

[2] M. I. Pradipta, Z. Situmorang, and R. W. Sembiring, “Multilayer Perceptron Performance Analysis in Liver Disease Classification,” Sinkron, vol. 9, no. 1, pp. 426–434, Jan. 2024, doi: 10.33395/sinkron.v9i1.13202.

[3] A. S. Afrah, “Sistem Diagnosa Penyakit Liver Menggunakan Metode Artificial Neural Network: Studi Berdasarkan Dataset Indian Liver Patient Dataset,” Jurnal Informatika Jurnal Pengembangan IT, vol. 8, pp. 308–312, May 2023, doi: 10.30591/jpit.v8i3.5346.

[4] R. Amin, R. Yasmin, S. Ruhi, M. H. Rahman, and M. S. Reza, “Prediction of chronic liver disease patients using integrated projection based statistical feature extraction with machine learning algorithms,” Inform Med Unlocked, vol. 36, p. 101155, Jan. 2023, doi: 10.1016/J.IMU.2022.101155.

[5] S. Noor, S. A. AlQahtani, and S. Khan, “XGBoost-Liver: An Intelligent Integrated Features Approach for Classifying Liver Diseases Using Ensemble XGBoost Training Model,” Computers, Materials & Continua, vol. 83, no. 1, pp. 1435–1450, 2025, doi: 10.32604/cmc.2025.061700.

[6] R. Kashyap and B. Kaur, “Liver Disease Prediction using Machine Learning Algorithms,” Int J Comput Appl, vol. 185, no. 27, pp. 36–44, Aug. 2023, doi: 10.5120/ijca2023923022.

[7] J. Lu, “Research on Prediction of Liver Disease Based on Machine Learning Models,” Highlights in Science, Engineering and Technology, vol. 68, pp. 21–28, Oct. 2023, doi: 10.54097/hset.v68i.11926.

[8] M. A. Nugraha, M. I. Mazdadi, A. Farmadi, Muliadi, and T. H. Saragih, “Penyeimbangan Kelas SMOTE dan Seleksi Fitur Ensemble Filter pada Support Vector Machine untuk Klasifikasi Penyakit Liver,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 10, no. 6, pp. 1273–1284, Dec. 2023, doi: 10.25126/jtiik.2023107234.

[9] Abhishek Shrivastava, “Liver Disease Patient Dataset 30K train data,” 2021.

[10] S. Khairunnisa, A. Adiwijaya, and S. Al Faraby, “Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19),” Jurnal Media Informatika Budidarma, vol. 5, no. 2, p. 406, Apr. 2021, doi: 10.30865/mib.v5i2.2835.

[11] M. R. A. Prasetya, A. M. Priyatno, and Nurhaeni, “Penanganan Imputasi Missing Values pada Data Time Series dengan Menggunakan Metode Data Mining,” Jurnal Informasi dan Teknologi, pp. 52–62, Jun. 2023, doi: 10.37034/jidt.v5i2.324.

[12] K. Doctor, T. Mao, and H. Mhaskar, “Encoding of data sets and algorithms,” Applied Numerical Mathematics, vol. 200, pp. 209–235, Jun. 2024, doi: 10.1016/j.apnum.2023.07.013.

[13] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.

[14] K. Maharana, S. Mondal, and B. Nemade, “A review: Data pre-processing and data augmentation techniques,” Global Transitions Proceedings, vol. 3, no. 1, pp. 91–99, Jun. 2022, doi: 10.1016/J.GLTP.2022.04.020.

[15] I. Muraina, “Ideal Dataset Splitting Ratios In Machine Learning Algorithms: General Concerns For Data Scientists and Data Analysis,” Jun. 2022.

[16] T. Inoue et al., “XGBoost, a Machine Learning Method, Predicts Neurological Recovery in Patients with Cervical Spinal Cord Injury,” Neurotrauma Rep, vol. 1, no. 1, pp. 8–16, Jan. 2020, doi: 10.1089/neur.2020.0009.

[17] A. Y. Mahmoud, “Novel efficient feature selection: Classification of medical and immunotherapy treatments utilising Random Forest and Decision Trees,” Intell Based Med, vol. 10, p. 100151, Jan. 2024, doi: 10.1016/J.IBMED.2024.100151.

[18] S. Sathyanarayanan, “Confusion Matrix-Based Performance Evaluation Metrics,” African Journal of Biomedical Research, pp. 4023–4031, Nov. 2024, doi: 10.53555/AJBR.v27i4S.4345.

[19] E. Helmud, E. Helmud, F. Fitriyani, and P. Romadiana, “Classification Comparison Performance of Supervised Machine Learning Random Forest and Decision Tree Algorithms Using Confusion Matrix,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 13, no. 1, pp. 92–97, Feb. 2024, doi: 10.32736/sisfokom.v13i1.1985.

Evaluation of SMOTE Technique in the Comparison of XGBoost and Random Forest Algorithms for Liver Disease Prediction

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

submit

tools

issn