Anemia Classification with Clinical Feature Engineering and SHAP Interpretation
DOI:
https://doi.org/10.30871/jaic.v9i5.10912Keywords:
Anemia, Hb/MCV ratio, Machine learning, SHP, XGBoostAbstract
Anemia is a global health issue that has a significant impact on quality of life and productivity. Early and accurate detection is essential to prevent more serious complications. This study aims to develop an anemia classification model based on machine learning technology using the XGBoost algorithm, as well as compare its performance with Logistic Regression and Random Forest methods. The dataset used in this study was obtained from the Kaggle platform, consisting of 1,421 samples and six clinical attributes, namely Gender, Hemoglobin (HGB), Mean Corpuscular Hemoglobin (MCH), Mean Corpuscular Hemoglobin Concentration (MCHC), Mean Corpuscular Volume (MCV), Result. During the feature engineering process, the derived feature of the hemoglobin-to-MCV ratio (Hb/MCV) was added, which is medically relevant in distinguishing types of anemia. Evaluation results showed that XGBoost and Random Forest achieved an accuracy rate and F1-Score of 100%, while Logistic Regression achieved a rate of 98.9%. XGBoost was selected as the primary model due to its efficient computational capabilities and support for interpretation using SHAP (SHapley Additive exPlanations). SHAP visualization revealed that the Hb/MCV ratio and hemoglobin were the most influential features in classification. This model has the potential to be used as a decision support system for automated anemia screening and can be further integrated into clinical systems.
Downloads
References
[1] S. Uçucu and F. Azik, “Artificial intelligence-driven diagnosis of β-thalassemia minor & iron deficiency anemia using machine learning models,” J Med Biochem, vol. 43, no. 1, pp. 11–18, 2024, doi: 10.5937/jomb0-38779.
[2] G. Gunčar et al., “An application of machine learning to haematological diagnosis,” Sci Rep, vol. 8, no. 1, p. 411, Jan. 2018, doi: 10.1038/s41598-017-18564-8.
[3] F. Nurrahman, H. Wijayanto, A. H. Wigena, and N. Nurjanah, “Pre-Processing Data On Multiclass Classification Of Anemia And Iron Deficiency With The Xgboost Method,” Barekeng: Jurnal Ilmu Matematika dan Terapan, vol. 17, no. 2, pp. 0767–0774, Jun. 2023, doi: 10.30598/barekengvol17iss2pp0767-0774.
[4] G. Airlangga, “Leveraging Machine Learning for Accurate Anemia Diagnosis Using Complete Blood Count Data,” Indonesian Journal of Artificial Intelligence and Data Mining, vol. 7, no. 2, p. 318, May 2024, doi: 10.24014/ijaidm.v7i2.29869.
[5] J. G. Gómez, C. Parra Urueta, D. S. Álvarez, V. Hernández Riaño, and G. Ramirez-Gonzalez, “Anemia Classification System Using Machine Learning,” Informatics, vol. 12, no. 1, p. 19, Feb. 2025, doi: 10.3390/informatics12010019.
[6] S. Sadiq et al., “Classification of β -Thalassemia Carriers From Red Blood Cell Indices Using Ensemble Classifier,” IEEE Access, vol. 9, pp. 45528–45538, 2021, doi: 10.1109/ACCESS.2021.3066782.
[7] World Health Organization, Anaemia in women of reproductive age: Global prevalence 2023. Geneva, Switzerland: WHO, 2025. [Online]. Available: https://www.who.int/data/gho/data/themes/topics/anaemia_in_women_and_children?utm_source=chatgpt.com
[8] Dinkes Kalbarprov.go.id, “Anemia pada Ibu Hamil dapat Sebabkan Stunting pada Anak,” https://dinkes.kalbarprov.go.id/artikel/anemia-pada-ibu-hamil-dapat-sebabkan-stunting-pada-anak/.
[9] J. L. Tait, J. R. Drain, S. L. Corrigan, J. M. Drake, and L. C. Main, “Impact of military training stress on hormone response and recovery,” PLoS One, vol. 17, no. 3, p. e0265121, Mar. 2022, doi: 10.1371/journal.pone.0265121.
[10] D. Viswanath, R. Hegde, V. Murthy, S. Nagashree, and R. Shah, “Red Cell Distribution Width in the Diagnosis of Iron Deficiency Anemia,” The Indian Journal of Pediatrics, vol. 68, no. 12, pp. 1117–1119, Dec. 2001, doi: 10.1007/BF02722922.
[11] A. Jovic, K. Brkic, and N. Bogunovic, “A review of feature selection methods with applications,” in 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), IEEE, May 2015, pp. 1200–1205. doi: 10.1109/MIPRO.2015.7160458.
[12] S. A. Fayaz, M. Zaman, S. Kaul, and M. A. Butt, “Is Deep Learning on Tabular Data Enough? An Assessment,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 4, 2022, doi: 10.14569/IJACSA.2022.0130454.
[13] Q.-C. Yao, H.-L. Zhai, and H.-C. Wang, “Ratio of hemoglobin to mean corpuscular volume: A new index for discriminating between iron deficiency anemia and thalassemia trait,” World J Clin Cases, vol. 11, no. 35, pp. 8270–8275, Dec. 2023, doi: 10.12998/wjcc.v11.i35.8270.
[14] P. Prasetyawan, I. Ahmad, R. I. Borman, Ardiansyah, Y. A. Pahlevi and D. E. Kurniawan, "Classification of the Period Undergraduate Study Using Back-propagation Neural Network," 2018 International Conference on Applied Engineering (ICAE), Batam, Indonesia, 2018, pp. 1-5, doi: 10.1109/INCAE.2018.8579389.
[15] M. S. Nugraha and F. I. Sanjaya, “Classification of Nutritional Status Using the Fuzzy Mamdani Method : Case Study at Banjar City Hospital”, JAIC, vol. 9, no. 4, pp. 1498–1505, Aug. 2025.
[16] I. D. A. P. Pratiwi Tentriajaya and I. G. N. L. Wijayakusuma, “Eye Disease Classification Using EfficientNet-B0 Based on Transfer Learning”, JAIC, vol. 9, no. 4, pp. 1415–1422, Aug. 2025.
[17] H. Rizaqi and I. Tahyudin, “Comparative Analysis of VGG16 and ResNet50 Model Performence in Cardiac ECG Image Classification”, JAIC, vol. 9, no. 3, pp. 707–715, Jun. 2025.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Ikhlasul Amalia, Rumini Rumini

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








