Machine Learning-Based Approach for HIV/AIDS Prediction: Feature Selection and Data Balancing Strategy

Authors

  • Abdul Mizwar A Rahim Universitas Amikom Yogyakarta
  • Ahmad Ridwan Universitas Amikom Yogyakarta
  • Bambang Pilu Hartato Universitas Amikom Yogyakarta
  • Firman Asharudin Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.30871/jaic.v9i2.9125

Keywords:

Machine Learning, Feature Selection, Data Balancing, HIV/AIDS Prediction, Classification

Abstract

HIV/AIDS remains a significant global health challenge, requiring accurate predictive models for early detection and improved clinical decision-making. However, developing an effective predictive model faces challenges such as data imbalance and the presence of irrelevant features, which can compromise model accuracy. This study aims to enhance the performance of AIDS infection prediction models by integrating feature selection, data balancing, and machine learning classification techniques. Feature selection is conducted using Pearson Correlation, Mutual Information, and Chi-Square tests to retain only the most relevant features. Random Oversampling, SMOTE, and ADASYN are employed to address data imbalance and improve model robustness. Nine machine learning algorithms, including Decision Tree, Random Forest, XGBoost, LightGBM, Gradient Boosting, Support Vector Machine, AdaBoost, and Logistic Regression, are tested for classification. Performance evaluation using confusion matrix, precision, recall, F1-score, and AUC-ROC shows that tree-based models (Random Forest, Extra Trees, and XGBoost) achieve the best results, particularly in handling minority class predictions. The study concludes that combining feature selection, data balancing, and machine learning techniques significantly improves predictive performance, making it a valuable approach for early detection and clinical decision support in HIV/AIDS diagnosis. Future research may explore hyperparameter tuning and real-world clinical data integration to enhance practical applicability.

Downloads

Download data is not yet available.

References

[1] M. Al-Mozaini et al., “Human immunodeficiency virus in Saudi Arabia: Current and future challenges,” J Infect Public Health, vol. 16, no. 9, pp. 1500–1509, Sep. 2023, doi: 10.1016/j.jiph.2023.06.012.

[2] E. Kumah, D. S. Boakye, R. Boateng, and E. Agyei, “Advancing the Global Fight Against HIV/Aids: Strategies, Barriers, and the Road to Eradication,” Ann Glob Health, vol. 89, no. 1, Nov. 2023, doi: 10.5334/aogh.4277.

[3] A. M. A. Rahim, A. Sunyoto, and M. R. Arief, “Stroke Prediction Using Machine Learning Method with Extreme Gradient Boosting Algorithm,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, pp. 595–606, Jul. 2022, doi: 10.30812/matrik.v21i3.1666.

[4] Z. M. Kusumaadhi, N. Farhanah, and M. A. Udji Sofro, “Risk Factors for Mortality among HIV/AIDS Patients,” Diponegoro International Medical Journal, vol. 2, no. 1, pp. 20–19, Mar. 2021, doi: 10.14710/dimj.v2i1.9667.

[5] A. Brahmandjati, A. Mizwar, A. Rahim, and F. Asharudin, “Optimasi Prediksi Diabetes Dengan Algoritma XGBoost Dan Teknik Preprocessing Data,” Dec. 2024. [Online]. Available: https://www.kaggle.com/datasets/mathchi/diabetes-data-set,

[6] A. M. A. Rahim, Inggrid Yanuar Risca Pratiwi, and Muhammad Ainul Fikri, “Klasifikasi Penyakit Jantung Menggunakan Metode Synthetic Minority Over-Sampling Technique Dan Random Forest Clasifier,” Indonesian Journal of Computer Science, vol. 12, no. 5, Nov. 2023, doi: 10.33022/ijcs.v12i5.3413.

[7] D. F. Wicaksono, R. S. Basuki, and D. Setiawan, “Peningkatan Performa Model Machine Learning XGBoost Classifier melalui Teknik Oversampling dalam Prediksi Penyakit AIDS,” Jurnal Media Informatika Budidarma, vol. 8, no. 2, p. 736, Apr. 2024, doi: 10.30865/mib.v8i2.7501.

[8] M. N. Fatorohman, K. Indriani, and M. N. Winnarto, “Analisa Prediksi Penyakit Hiv Menggunakan Random Forest,” Jurnal Infortech, vol. 6, no. 2, pp. 150–155, Dec. 2024, doi: 10.31294/infortech.v6i2.24436.

[9] M. Alehegn, “Application of machine learning and deep learning for the prediction of HIV/AIDS,” HIV & AIDS Review, vol. 21, no. 1, pp. 17–23, Jan. 2022, doi: 10.5114/hivar.2022.112852.

[10] J. Fieggen, E. Smith, L. Arora, and B. Segal, “The role of machine learning in HIV risk prediction,” Frontiers in Reproductive Health, vol. 4, Dec. 2022, doi: 10.3389/frph.2022.1062387.

[11] Aadarsh Velu, “AIDS Virus Infection Prediction.”

[12] A. N. Puteri, A. Arizal, and A. D. Achmad, “Feature Selection Correlation-Based pada Prediksi Nasabah Bank Telemarketing untuk Deposito,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 20, no. 2, pp. 335–342, May 2021, doi: 10.30812/matrik.v20i2.1183.

[13] D. Leni, A. Dwiharzandis, R. Sumiati, and S. Afriyani, “Feature Selection Based on Pearson Correlation in Building Energy Efficiency Modeling,” 2023.

[14] T. I. Saputra, “Pengkategorian Data Angket Mahasiswa dengan Mutual Information dan K-Nearest Neighbor Indra Tri Saputra,” 2019.

[15] T. Ernayanti, M. Mustafid, A. Rusgiyono, and A. R. Hakim, “Penggunaan Seleksi Fitur Chi-Square Dan Algoritma Multinomial Naïve Bayes Untuk Analisis Sentimen Pelangggan Tokopedia,” Jurnal Gaussian, vol. 11, no. 4, pp. 562–571, Feb. 2023, doi: 10.14710/j.gauss.11.4.562-571.

[16] C. Kaope and Y. Pristyanto, “The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 22, no. 2, pp. 227–238, Mar. 2023, doi: 10.30812/matrik.v22i2.2515.

[17] M. Temraz and M. T. Keane, “Solving the class imbalance problem using a counterfactual method for data augmentation,” Machine Learning with Applications, vol. 9, p. 100375, Sep. 2022, doi: 10.1016/j.mlwa.2022.100375.

[18] Hizbul Izzi, Arief Setyanto, and Anggit Dwi Hartanto, “Optimalisasi Akurasi Algoritma Naïve Bayes Dengan Metode Syntetic Minority Oversampling Technique (Smote) Pada Data Numerik,” Infotek: Jurnal Informatika dan Teknologi, vol. 8, no. 1, pp. 217–227, Jan. 2025, doi: 10.29408/jit.v8i1.28340.

[19] R. Syahwaluddin and D. Alita, “Penerapan Oversampling Pada Klasifikasi Ujaran Kebencian Menggunakan Bidirectional Encoder Representations from Transformers,” The Indonesian Journal of Computer Science, vol. 13, no. 4, Aug. 2024, doi: 10.33022/ijcs.v13i4.4295.

[20] M. Tiara Triani Br Sirait, N. Siti Fathonah, and M. Nurkamal Fauzan, “Pemanfaatan Algoritma Adasyn Dan Support Vector Machine Dalam Meningkatkan Akurasi Prediksi Kanker Paru-Paru,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 8, no. 5, pp. 8773–8778, Sep. 2024, doi: 10.36040/jati.v8i5.10752.

[21] M. Mustapha et al., “A hybrid machine learning approach for imbalanced irrigation water quality classification,” Desalination Water Treat, vol. 321, p. 100910, Jan. 2025, doi: 10.1016/j.dwt.2024.100910.

[22] M. Subramanian, K. Shanmugavadivel, and P. S. Nandhini, “On fine-tuning deep learning models using transfer learning and hyper-parameters optimization for disease identification in maize leaves,” Neural Comput Appl, vol. 34, no. 16, pp. 13951–13968, Aug. 2022, doi: 10.1007/s00521-022-07246-w.

[23] I. H. Sarker, “Machine Learning: Algorithms, Real-World Applications and Research Directions,” SN Comput Sci, vol. 2, no. 3, p. 160, May 2021, doi: 10.1007/s42979-021-00592-x.

[24] K. Maharana, S. Mondal, and B. Nemade, “A review: Data pre-processing and data augmentation techniques,” Global Transitions Proceedings, vol. 3, no. 1, pp. 91–99, Jun. 2022, doi: 10.1016/j.gltp.2022.04.020.

[25] M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: 10.1109/ACCESS.2022.3151048.

[26] A. Vanacore, M. S. Pellegrino, and A. Ciardiello, “Fair evaluation of classifier predictive performance based on binary confusion matrix,” Comput Stat, vol. 39, no. 1, pp. 363–383, Feb. 2024, doi: 10.1007/s00180-022-01301-9.

Downloads

Published

2025-03-12

How to Cite

[1]
A. M. A. Rahim, A. Ridwan, B. P. Hartato, and F. Asharudin, “Machine Learning-Based Approach for HIV/AIDS Prediction: Feature Selection and Data Balancing Strategy”, JAIC, vol. 9, no. 2, pp. 338–347, Mar. 2025.

Issue

Section

Articles

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.