A Comparative Performance of SMOTE, ADASYN and Random Oversampling in Machine Learning Models on Prostate Cancer Dataset
DOI:
https://doi.org/10.30871/jaic.v9i3.9308Keywords:
Imbalance Class, Oversampling, Classification, Machine Learning, Prostate CancerAbstract
Class imbalance in medical datasets, including prostate cancer, can affect the performance of machine learning models in detecting minority cases. This study compares three oversampling techniques - SMOTE, ADASYN, and Random Oversampling - to address data imbalance in prostate cancer classification. These techniques are applied to Random Forest (RF), Decision Tree (DT), and LightGBM (LGBM), which are evaluated using accuracy, precision, recall, F1-score, and ROC-AUC. In improving the reliability of the evaluation, K-Fold Cross Validation was used to reduce the risk of overfitting and ensure stable results. The findings show that oversampling techniques improve model performance compared to the baseline. Random Oversampling has the best performance for Random Forest with accuracy 0.85, recall 0.888, precision 0.873, F1-score 0.879, and ROC-AUC 0.838. SMOTE produced the highest Decision Tree performance with accuracy 0.80, recall 0.838, precision 0.843, F1-score 0.839, and ROC-AUC 0.788. ADASYN provided the most improvement for LightGBM, achieving accuracy 0.89, recall 0.919, precision 0.913, F1-score 0.913, and ROC-AUC 0.879. These results confirm that the oversampling method improves prostate cancer classification performance by tailoring the resampling technique to the model characteristics.
Downloads
References
[1] D. Kusuma Ningrum and A. Maytsa Ismawardi, “Efektivitas Algoritma Kecerdasan Buatan Dalam Implementasi Kesehatan Mental : Systematic Literature Review,” 2025.
[2] M. Khushi et al., “A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data,” IEEE Access, vol. 9, pp. 109960–109975, 2021, doi: 10.1109/ACCESS.2021.3102399.
[3] A. Muzakir, A. Desiani, and A. Amran, “Klasifikasi Penyakit Kanker Prostat Menggunakan Algoritma Naïve Bayes dan K-Nearest Neighbor,” Komputika : Jurnal Sistem Komputer, vol. 12, no. 1, pp. 73–79, May 2023, doi: 10.34010/komputika.v12i1.9629.
[4] F. Gurcan and A. Soylu, “Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis,” Cancers (Basel), vol. 16, no. 19, Oct. 2024, doi: 10.3390/cancers16193417.
[5] R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” in 2020 11th International Conference on Information and Communication Systems, ICICS 2020, Institute of Electrical and Electronics Engineers Inc., Apr. 2020, pp. 243–248. doi: 10.1109/ICICS49469.2020.239556.
[6] I. Dey and V. Pratap, “A Comparative Study of SMOTE, Borderline-SMOTE, and ADASYN Oversampling Techniques using Different Classifiers,” in Proceedings - 2023 3rd International Conference on Smart Data Intelligence, ICSMDI 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 294–302. doi: 10.1109/ICSMDI57622.2023.00060.
[7] T. A. Assegie, A. O. Salau, K. Sampath, R. Govindarajan, S. Murugan, and B. Lakshmi, “Evaluation of Adaptive Synthetic Resampling Technique for Imbalanced Breast Cancer Identification,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 1000–1007. doi: 10.1016/j.procs.2024.04.095.
[8] E. Febriantoro, E. Setyati, and J. Santoso, “Pemodelan Prediksi Kuantitas Penjualan Mainan Menggunakan LightGBM,” SMARTICS Journal, vol. 9, no. 1, pp. 7–13, Apr. 2023, doi: 10.21067/smartics.v9i1.8279.
[9] C. Herdian, A. Kamila, and I. G. Agung Musa Budidarma, “Studi Kasus Feature Engineering Untuk Data Teks: Perbandingan Label Encoding dan One-Hot Encoding Pada Metode Linear Regresi,” Technologia : Jurnal Ilmiah, vol. 15, no. 1, p. 93, Jan. 2024, doi: 10.31602/tji.v15i1.13457.
[10] T. Zulhaq Jasman, E. Hasmin, C. Susanto, and W. Musu, “Perbandingan Logistic Regression, Random Forest, dan Perceptron pada Klasifikasi Pasien Gagal Jantung,” CSRID Journal, vol. 14, no. 3, pp. 271–286, 2022, doi: 10.22303/csrid.14.3.2022.271-286.
[11] F. N. Zahrah and M. Muljono, “Machine Learning untuk Deteksi Stres Pelajar: Perceptron sebagai Model Klasifikasi Efektif untuk Intervensi Dini,” Edumatic: Jurnal Pendidikan Informatika, vol. 8, no. 2, pp. 764–773, Dec. 2024, doi: 10.29408/edumatic.v8i2.28011.
[12] Baiq Nurul Azmi, Arief Hermawan, and Donny Avianto, “Analisis Pengaruh Komposisi Data Training dan Data Testing pada Penggunaan PCA dan Algoritma Decision Tree untuk Klasifikasi Penderita Penyakit Liver,” JTIM : Jurnal Teknologi Informasi dan Multimedia, vol. 4, no. 4, pp. 281–290, Feb. 2023, doi: 10.35746/jtim.v4i4.298.
[13] R. Oktafiani, A. Hermawan, and D. Avianto, “Pengaruh Komposisi Split data Terhadap Performa Klasifikasi Penyakit Kanker Payudara Menggunakan Algoritma Machine Learning,” Jurnal Sains dan Informatika, pp. 19–28, Jun. 2023, doi: 10.34128/jsi.v9i1.622.
[14] W. Wijiyanto, A. I. Pradana, S. Sopingi, and V. Atina, “Teknik K-Fold Cross Validation untuk Mengevaluasi Kinerja Mahasiswa,” Jurnal Algoritma, vol. 21, no. 1, May 2024, doi: 10.33364/algoritma/v.21-1.1618.
[15] Ridwan, E. Heni Hermaliani, and M. Ernawati, “Penerapan Metode SMOTE Untuk Mengatasi Imbalanced Data Pada Klasifikasi Ujaran Kebencian,” Jan. 2024. [Online]. Available: http://jurnal.bsi.ac.id/index.php/co-science
[16] M. Persada Pulungan, A. Purnomo, and A. Kurniasih, “Penerapan Smote Untuk Mengatasi Imbalance Class Dalam Klasifikasi Kepribadian Mbti Menggunakan Naive Bayes,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), Sep. 2024, doi: 10.25126/jtiik.2024117989.
[17] S. Diantika, “Penerapan Teknik Random Oversampling Untuk Mengatasi Imbalance Class Dalam Klasifikasi Website Phishing Menggunakan Algoritma Lightgbm,” 2023.
[18] R. Aryanti, T. Misriati, and R. Hidayat, “Klasifikasi Risiko Kesehatan Ibu Hamil Menggunakan Random Oversampling Untuk Mengatasi Ketidakseimbangan Data,” KLIK: Kajian Ilmiah Informatika dan Komputer, vol. 3, no. 5, pp. 409–416, 2023, [Online]. Available: https://djournals.com/klik
[19] N. Wuryani, S. Agustiani, I. Komputer, and N. Mandiri, “Random Forest Classifier untuk Deteksi Penderita COVID-19 berbasis Citra CT Scan,” Jurnal Teknik Komputer AMIK BSI, vol. 7, no. 2, 2021, doi: 10.31294/jtk.v4i2.
[20] R. N. Ramadhon, A. Ogi, A. P. Agung, R. Putra, S. S. Febrihartina, and U. Firdaus, “Implementasi Algoritma Decision Tree untuk Klasifikasi Pelanggan Aktif atau Tidak Aktif pada Data Bank,” 2024.
[21] H. Mahmud Nawawi, A. Baitul Hikmah, A. Mustopa, and G. Wijaya, “Model Klasifikasi Machine Learning untuk Prediksi Ketepatan Penempatan Karir,” Jurnal SAINTEKOM, vol. 14, no. 1, pp. 13–25, Mar. 2024, doi: 10.33020/saintekom.v14i1.512.
[22] A. Alim Murtopo, M. Aditdya, P. Septiana Ananda, and G. Gunawan, “Penerapan Computer Vision Untuk Mendeteksi Kelengkapan Atribut Siswa Menggunakan Metode CNN,” vol. 11, no. 2, 2024.
[23] E. Ramadanti, D. A. Dinathi, C. Sri, K. Aditya, and R. Chandranegara, “Diabetes Disease Detection Classification Using Light Gradient Boosting (LightGBM) With Hyperparameter Tuning,” Jurnal dan Penelitian Teknik Informatika, vol. 8, no. 2, 2024, doi: 10.33395/v8i2.13530.
[24] A. Candra, Moh. Erkamim, M. Muharrom, and E. Prayitno, “Klasifikasi Stunting Pada Balita Berdasarkan Status Gizi Menggunakan Pendekatan Support Vector Machine (SVM),” Jurnal Ilmiah FIFO, vol. 16, no. 2, p. 171, Nov. 2024, doi: 10.22441/fifo.2024.v16i2.007.
[25] C. Prakoso and A. Hermawan, “KLIK: Kajian Ilmiah Informatika dan Komputer Perbandingan Model Machine Learning dalam Analisis Sentimen Ulasan Pengunjung Keraton Yogyakarta pada Google Maps,” Media Online, vol. 4, no. 3, pp. 1292–1302, 2023, doi: 10.30865/klik.v4i3.1419.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Aditya Herdiansyah Putra, Abu Salam

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).