Evaluation of SMOTE Technique in the Comparison of XGBoost and Random Forest Algorithms for Liver Disease Prediction
DOI:
https://doi.org/10.30871/jaic.v9i6.10239Keywords:
Data Mining, Liver Disease Prediction, Random Forest, XGBoost, Model EvaluationAbstract
In many countries, including Indonesia, liver disease remains a major cause of morbidity and mortality. Early detection plays a crucial role in improving treatment outcomes. This study evaluates the performance of two widely used machine learning models Random Forest and XGBoost for predicting liver disease, employing the SMOTE balancing technique to address class imbalance. The primary objectives are to enhance model fairness, reduce overfitting, and improve sensitivity toward the minority class. Model performance is assessed using accuracy, precision, recall, and F1-score metrics. The XGBoost model achieved an average accuracy of 99.74%, precision of 99.77%, recall of 99.75%, and F1-score of 99.72%, while the Random Forest model attained an average accuracy of 99.82%, precision of 99.89%, recall of 99.75%, and F1-score of 99.75%. Both models demonstrated excellent predictive capability, with Random Forest slightly outperforming XGBoost. These results highlight the importance of data balancing and robust model validation in developing reliable machine learning models for healthcare decision-making.
Downloads
References
[1] H. Devarbhavi, S. K. Asrani, J. P. Arab, Y. A. Nartey, E. Pose, and P. S. Kamath, “Global burden of liver disease: 2023 update,” J Hepatol, vol. 79, no. 2, pp. 516–537, Aug. 2023, doi: 10.1016/j.jhep.2023.03.017.
[2] M. I. Pradipta, Z. Situmorang, and R. W. Sembiring, “Multilayer Perceptron Performance Analysis in Liver Disease Classification,” Sinkron, vol. 9, no. 1, pp. 426–434, Jan. 2024, doi: 10.33395/sinkron.v9i1.13202.
[3] A. S. Afrah, “Sistem Diagnosa Penyakit Liver Menggunakan Metode Artificial Neural Network: Studi Berdasarkan Dataset Indian Liver Patient Dataset,” Jurnal Informatika Jurnal Pengembangan IT, vol. 8, pp. 308–312, May 2023, doi: 10.30591/jpit.v8i3.5346.
[4] R. Amin, R. Yasmin, S. Ruhi, M. H. Rahman, and M. S. Reza, “Prediction of chronic liver disease patients using integrated projection based statistical feature extraction with machine learning algorithms,” Inform Med Unlocked, vol. 36, p. 101155, Jan. 2023, doi: 10.1016/J.IMU.2022.101155.
[5] S. Noor, S. A. AlQahtani, and S. Khan, “XGBoost-Liver: An Intelligent Integrated Features Approach for Classifying Liver Diseases Using Ensemble XGBoost Training Model,” Computers, Materials & Continua, vol. 83, no. 1, pp. 1435–1450, 2025, doi: 10.32604/cmc.2025.061700.
[6] R. Kashyap and B. Kaur, “Liver Disease Prediction using Machine Learning Algorithms,” Int J Comput Appl, vol. 185, no. 27, pp. 36–44, Aug. 2023, doi: 10.5120/ijca2023923022.
[7] J. Lu, “Research on Prediction of Liver Disease Based on Machine Learning Models,” Highlights in Science, Engineering and Technology, vol. 68, pp. 21–28, Oct. 2023, doi: 10.54097/hset.v68i.11926.
[8] M. A. Nugraha, M. I. Mazdadi, A. Farmadi, Muliadi, and T. H. Saragih, “Penyeimbangan Kelas SMOTE dan Seleksi Fitur Ensemble Filter pada Support Vector Machine untuk Klasifikasi Penyakit Liver,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 10, no. 6, pp. 1273–1284, Dec. 2023, doi: 10.25126/jtiik.2023107234.
[9] Abhishek Shrivastava, “Liver Disease Patient Dataset 30K train data,” 2021.
[10] S. Khairunnisa, A. Adiwijaya, and S. Al Faraby, “Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19),” Jurnal Media Informatika Budidarma, vol. 5, no. 2, p. 406, Apr. 2021, doi: 10.30865/mib.v5i2.2835.
[11] M. R. A. Prasetya, A. M. Priyatno, and Nurhaeni, “Penanganan Imputasi Missing Values pada Data Time Series dengan Menggunakan Metode Data Mining,” Jurnal Informasi dan Teknologi, pp. 52–62, Jun. 2023, doi: 10.37034/jidt.v5i2.324.
[12] K. Doctor, T. Mao, and H. Mhaskar, “Encoding of data sets and algorithms,” Applied Numerical Mathematics, vol. 200, pp. 209–235, Jun. 2024, doi: 10.1016/j.apnum.2023.07.013.
[13] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.
[14] K. Maharana, S. Mondal, and B. Nemade, “A review: Data pre-processing and data augmentation techniques,” Global Transitions Proceedings, vol. 3, no. 1, pp. 91–99, Jun. 2022, doi: 10.1016/J.GLTP.2022.04.020.
[15] I. Muraina, “Ideal Dataset Splitting Ratios In Machine Learning Algorithms: General Concerns For Data Scientists and Data Analysis,” Jun. 2022.
[16] T. Inoue et al., “XGBoost, a Machine Learning Method, Predicts Neurological Recovery in Patients with Cervical Spinal Cord Injury,” Neurotrauma Rep, vol. 1, no. 1, pp. 8–16, Jan. 2020, doi: 10.1089/neur.2020.0009.
[17] A. Y. Mahmoud, “Novel efficient feature selection: Classification of medical and immunotherapy treatments utilising Random Forest and Decision Trees,” Intell Based Med, vol. 10, p. 100151, Jan. 2024, doi: 10.1016/J.IBMED.2024.100151.
[18] S. Sathyanarayanan, “Confusion Matrix-Based Performance Evaluation Metrics,” African Journal of Biomedical Research, pp. 4023–4031, Nov. 2024, doi: 10.53555/AJBR.v27i4S.4345.
[19] E. Helmud, E. Helmud, F. Fitriyani, and P. Romadiana, “Classification Comparison Performance of Supervised Machine Learning Random Forest and Decision Tree Algorithms Using Confusion Matrix,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 13, no. 1, pp. 92–97, Feb. 2024, doi: 10.32736/sisfokom.v13i1.1985.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Wahyutri Nur Rohman, I Made Artha Agastya

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








