Anemia Classification with Clinical Feature Engineering and SHAP Interpretation

Authors

  • Ikhlasul Amalia Universitas Amikom Yogyakarta
  • Rumini Rumini Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.30871/jaic.v9i5.10912

Keywords:

Anemia, Hb/MCV ratio, Machine learning, SHP, XGBoost

Abstract

Anemia is a global health issue that has a significant impact on quality of life and productivity. Early and accurate detection is essential to prevent more serious complications. This study aims to develop an anemia classification model based on machine learning technology using the XGBoost algorithm, as well as compare its performance with Logistic Regression and Random Forest methods. The dataset used in this study was obtained from the Kaggle platform, consisting of 1,421 samples and six clinical attributes, namely Gender, Hemoglobin (HGB), Mean Corpuscular Hemoglobin (MCH), Mean Corpuscular Hemoglobin Concentration (MCHC), Mean Corpuscular Volume (MCV), Result. During the feature engineering process, the derived feature of the hemoglobin-to-MCV ratio (Hb/MCV) was added, which is medically relevant in distinguishing types of anemia. Evaluation results showed that XGBoost and Random Forest achieved an accuracy rate and F1-Score of 100%, while Logistic Regression achieved a rate of 98.9%. XGBoost was selected as the primary model due to its efficient computational capabilities and support for interpretation using SHAP (SHapley Additive exPlanations). SHAP visualization revealed that the Hb/MCV ratio and hemoglobin were the most influential features in classification. This model has the potential to be used as a decision support system for automated anemia screening and can be further integrated into clinical systems.

Downloads

Download data is not yet available.

References

[1] S. Uçucu and F. Azik, “Artificial intelligence-driven diagnosis of β-thalassemia minor & iron deficiency anemia using machine learning models,” J Med Biochem, vol. 43, no. 1, pp. 11–18, 2024, doi: 10.5937/jomb0-38779.

[2] G. Gunčar et al., “An application of machine learning to haematological diagnosis,” Sci Rep, vol. 8, no. 1, p. 411, Jan. 2018, doi: 10.1038/s41598-017-18564-8.

[3] F. Nurrahman, H. Wijayanto, A. H. Wigena, and N. Nurjanah, “Pre-Processing Data On Multiclass Classification Of Anemia And Iron Deficiency With The Xgboost Method,” Barekeng: Jurnal Ilmu Matematika dan Terapan, vol. 17, no. 2, pp. 0767–0774, Jun. 2023, doi: 10.30598/barekengvol17iss2pp0767-0774.

[4] G. Airlangga, “Leveraging Machine Learning for Accurate Anemia Diagnosis Using Complete Blood Count Data,” Indonesian Journal of Artificial Intelligence and Data Mining, vol. 7, no. 2, p. 318, May 2024, doi: 10.24014/ijaidm.v7i2.29869.

[5] J. G. Gómez, C. Parra Urueta, D. S. Álvarez, V. Hernández Riaño, and G. Ramirez-Gonzalez, “Anemia Classification System Using Machine Learning,” Informatics, vol. 12, no. 1, p. 19, Feb. 2025, doi: 10.3390/informatics12010019.

[6] S. Sadiq et al., “Classification of β -Thalassemia Carriers From Red Blood Cell Indices Using Ensemble Classifier,” IEEE Access, vol. 9, pp. 45528–45538, 2021, doi: 10.1109/ACCESS.2021.3066782.

[7] World Health Organization, Anaemia in women of reproductive age: Global prevalence 2023. Geneva, Switzerland: WHO, 2025. [Online]. Available: https://www.who.int/data/gho/data/themes/topics/anaemia_in_women_and_children?utm_source=chatgpt.com

[8] Dinkes Kalbarprov.go.id, “Anemia pada Ibu Hamil dapat Sebabkan Stunting pada Anak,” https://dinkes.kalbarprov.go.id/artikel/anemia-pada-ibu-hamil-dapat-sebabkan-stunting-pada-anak/.

[9] J. L. Tait, J. R. Drain, S. L. Corrigan, J. M. Drake, and L. C. Main, “Impact of military training stress on hormone response and recovery,” PLoS One, vol. 17, no. 3, p. e0265121, Mar. 2022, doi: 10.1371/journal.pone.0265121.

[10] D. Viswanath, R. Hegde, V. Murthy, S. Nagashree, and R. Shah, “Red Cell Distribution Width in the Diagnosis of Iron Deficiency Anemia,” The Indian Journal of Pediatrics, vol. 68, no. 12, pp. 1117–1119, Dec. 2001, doi: 10.1007/BF02722922.

[11] A. Jovic, K. Brkic, and N. Bogunovic, “A review of feature selection methods with applications,” in 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), IEEE, May 2015, pp. 1200–1205. doi: 10.1109/MIPRO.2015.7160458.

[12] S. A. Fayaz, M. Zaman, S. Kaul, and M. A. Butt, “Is Deep Learning on Tabular Data Enough? An Assessment,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 4, 2022, doi: 10.14569/IJACSA.2022.0130454.

[13] Q.-C. Yao, H.-L. Zhai, and H.-C. Wang, “Ratio of hemoglobin to mean corpuscular volume: A new index for discriminating between iron deficiency anemia and thalassemia trait,” World J Clin Cases, vol. 11, no. 35, pp. 8270–8275, Dec. 2023, doi: 10.12998/wjcc.v11.i35.8270.

[14] P. Prasetyawan, I. Ahmad, R. I. Borman, Ardiansyah, Y. A. Pahlevi and D. E. Kurniawan, "Classification of the Period Undergraduate Study Using Back-propagation Neural Network," 2018 International Conference on Applied Engineering (ICAE), Batam, Indonesia, 2018, pp. 1-5, doi: 10.1109/INCAE.2018.8579389.

[15] M. S. Nugraha and F. I. Sanjaya, “Classification of Nutritional Status Using the Fuzzy Mamdani Method : Case Study at Banjar City Hospital”, JAIC, vol. 9, no. 4, pp. 1498–1505, Aug. 2025.

[16] I. D. A. P. Pratiwi Tentriajaya and I. G. N. L. Wijayakusuma, “Eye Disease Classification Using EfficientNet-B0 Based on Transfer Learning”, JAIC, vol. 9, no. 4, pp. 1415–1422, Aug. 2025.

[17] H. Rizaqi and I. Tahyudin, “Comparative Analysis of VGG16 and ResNet50 Model Performence in Cardiac ECG Image Classification”, JAIC, vol. 9, no. 3, pp. 707–715, Jun. 2025.

Downloads

Published

2025-10-06

How to Cite

[1]
I. Amalia and R. Rumini, “Anemia Classification with Clinical Feature Engineering and SHAP Interpretation”, JAIC, vol. 9, no. 5, pp. 2195–2201, Oct. 2025.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.