Perbandingan Performa Teknik Sampling Data untuk Klasifikasi Pasien Terinfeksi Covid-19 Menggunakan Rontgen Dada

Keywords: Chest X-rays, Classification, Covid-19, Data Sampling, SMOTE


The COVID-19 virus became a virus that was deadly and shocked the world. One of the consequences caused by the COVID-19 virus is a respiratory infection. The solution put forward for this problem is with a prediction of the COVID-19 virus infection. This prediction was made based on the classification of chest X-ray data. One challenging issue in this field is the imbalance on the amount of data between infected chest X-rays and uninfected chest X-rays. The result of imbalanced data is data classification that ignores classes with fewer data. To overcome this problem, the data sampling technique becomes a mechanism to make the data balanced. For this reason, several data sampling techniques will be evaluated in this study. Data sampling techniques include Random Undersampling (RUS), Random Oversampling (ROS), Combination of Over-Undersampling (COUS), Synthetic Minority Over-sampling Technique (SMOTE), and Tomek Link (T-Link). This study also uses the Support Vector Machines (SVM) data classification, because it has high accuracy. Furthermore, the evaluation is carried out by selecting the highest accuracy and Area Under Curve (AUC). The best sampling technique found was SMOTE with an accuracy value of 99% and an AUC value of 99.32%. The SMOTE technique is the best data sampling technique for the classification of COVID-19 chest x-ray data.


Download data is not yet available.


Yunus, N. R. and Rezki, A. (2020) ‘Kebijakan Pemberlakuan Lock Down Sebagai Antisipasi Penyebaran Corona Virus Covid-19’, SALAM: Jurnal Sosial dan Budaya Syar-i, 7(3). doi: 10.15408/sjsbs.v7i3.15083.

Shi, H. et al. (2020) ‘Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study’, The Lancet Infectious Diseases. Elsevier Ltd, 20(4), pp. 425–434. doi: 10.1016/S1473-3099(20)30086-4.

Guo, H. et al. (2020) ‘The impact of the COVID-19 epidemic on the utilization of emergency dental services’, Journal of Dental Sciences. Elsevier B.V., (xxxx), pp. 0–3. doi: 10.1016/j.jds.2020.02.002.

Pastor, C. K. L. (2020) ‘Sentiment Analysis on Synchronous Online Delivery of Instruction due to Extreme Community Quarantine in the Philippines caused by COVID-19 Pandemic’, Asian Journal of Multidisciplinary Studies, 3(1).

Li, Q. et al. (2020) ‘Early Transmission Dynamics in Wuhan, China, of Novel8 Coronavirus–Infected Pneumonia’, New England Journal of Medicine, pp. 1199–1207. doi: 10.1056/nejmoa2001316.

Mahase, E. (2020) ‘Coronavirus covid-19 has killed more people than SARS and MERS combined, despite lower case fatality rate’, BMJ (Clinical research ed.), 368(February), p. m641. doi: 10.1136/bmj.m641.

Sohrabi, C. et al. (2020) ‘World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19)’, International Journal of Surgery. Elsevier, 76(February), pp. 71–76. doi: 10.1016/j.ijsu.2020.02.034.

Bergtholdt, M., Wiemker, R. and Klinder, T. (2016) ‘Pulmonary nodule detection using a cascaded SVM classifier’, Medical Imaging 2016: Computer-Aided Diagnosis, 9785, p. 978513. doi: 10.1117/12.2216747.

Harefa, J. and Pratiwi, M. (2016) ‘Comparison Classifier: Support Vector Machine (SVM) and K-Nearest Neighbor (K-NN) In Digital Mammogram Images’, Juisi, 02(02), pp. 35–40. Available at:

Fernández, A. et al. (2018) Learning from Imbalanced Data Sets, IEEE Transactions on Knowledge and Data Engineering. Cham: Springer International Publishing. doi: 10.1007/978-3-319-98074-4.

Cohen, J. P., Morrison, P. and Dao, L. (2020) ‘COVID-19 Image Data Collection’. Available at:

Wang, X. et al. (2019) ‘ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases’, Advances in Computer Vision and Pattern Recognition, pp. 369–392. doi: 10.1007/978-3-030-13969-8_18.

Kim, S. et al. (2018) ‘Time-resolved fractal dimension analysis in ferroelectric copolymer thin films using R-based image processing’, Materials Letters. Elsevier B.V., 230, pp. 195–198. doi: 10.1016/j.matlet.2018.07.125.

Ole´s, A. et al. (2020) ‘Image processing and analysis toolbox for R’, R package version 4.30.0, pp. 1–52. Available at:

Zhu, B. et al. (2019) ‘IRIC: An R sampling for binary imbalanced classification’, SoftwareX. Elsevier B.V., 10(October), p. 100341. doi: 10.1016/j.softx.2019.100341.

Lunardon, N., Menardi, G. and Torelli, N. (2015) ‘ROSE: Random Over-Sampling Examples’, R package version 0.0-3, pp. 1–19. doi: 10.1007/s10618-012-0295-5.

Meyer, D. et al. (2019) ‘Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien’, R package version 1.7-3, pp. 1–63. Available at:

Andrea, A. et al. (2015) ‘Racing for Unbalanced Methods Selection’, R package version 2.0, pp. 1–18. Available at:

Mehmood, R. and Selwal, A. (2020) Proceedings of ICRIC 2019, Lecture Notes in Electrical Engineering. Edited by P. K. Singh et al. Cham: Springer International Publishing (Lecture Notes in Electrical Engineering). doi: 10.1007/978-3-030-29407-6.

Johnson, J. M. and Khoshgoftaar, T. M. (2019) ‘Deep learning and data sampling with imbalanced big data’, Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019, (July), pp. 175–183. doi: 10.1109/IRI.2019.00038.

Pang, Y. et al. (2019) ‘A signature-based assistant random oversampling method for malware detection’, Proceedings - 2019 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/13th IEEE International Conference on Big Data Science and Engineering, TrustCom/BigDataSE 2019, pp. 256–263. doi: 10.1109/TrustCom/BigDataSE.2019.00042.

Raghuwanshi, B. S. and Shukla, S. (2020) ‘SMOTE based class-specific extreme learning machine for imbalanced learning’, Knowledge-Based Systems. Elsevier B.V., 187, p. 104814. doi: 10.1016/j.knosys.2019.06.022.

Chawla, N. V et al. (2002) ‘SMOTE: Synthetic minority over-sampling technique’, Journal of Artificial Intelligence Research, 16, pp. 321–357. doi: 10.1613/jair.953.

Komori, O. and Eguchi, S. (2019) Statistical Methods for Imbalanced Data in Ecological and Biological Studies. doi: 10.1007/978-4-431-55570-4.

AT, E. et al. (2016) ‘Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method’, Global Journal of Technology and Optimization, 01(S1). doi: 10.4172/2229-8711.S1111.

Cortes, C. and Vapnik, V. (1995) ‘Support-vector networks’, Machine Learning, 20(3), pp. 273–297. doi: 10.1007/BF00994018.

Wang, M. and Chen, H. (2020) ‘Chaotic multi-swarm whale optimizer boosted support vector machine for medical diagnosis’, Applied Soft Computing Journal. Elsevier B.V., 88, p.105946. doi: 10.1016/j.asoc.2019.105946.

Vluymans, S. (2019) Dealing with imbalanced and weakly labelled data in machine learning using fuzzy and rough set methods, Studies in Computational Intelligence. doi: 10.1007/978-3-030-04663-7_1.

Jain, M. et al. (2020) ‘Speech Emotion Recognition using Support Vector Machine’, International Journal of Smart Home, 6(2), pp. 101–108. doi: 10.1109/kst.2013.6512793.

Apostolopoulos, I. D. and Mpesiana, T. A. (2020) ‘Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks’, Physical and Engineering Sciences in Medicine. Springer International Publishing, (0123456789), pp. 1–6. doi: 10.1007/s13246-020-00865-4.

Purnajaya, A. R. and Kusuma, W. A. (2019) Prediksi Interaksi pada Jejaring Bipartite Senyawa dan Protein pada Data yang Tidak Seimbang. Institut Pertanian Bogor. doi: 10.13140/RG.2.2.28328.52484.

How to Cite
A. Purnajaya and F. Hanggara, “Perbandingan Performa Teknik Sampling Data untuk Klasifikasi Pasien Terinfeksi Covid-19 Menggunakan Rontgen Dada”, JAIC, vol. 5, no. 1, pp. 37-42, Jun. 2021.