Enhancing Medical Named Entity Recognition with Ensemble Voting of BERT-Based Models on BC5CDR

Authors

  • Fadhli Faqih Maulana Universitas Dian Nuswantoro
  • Abu Salam Universitas Dian Nuswantoro

DOI:

https://doi.org/10.30871/jaic.v9i3.9549

Keywords:

BC5CDR, BioBERT, TinyBERT, ClinicalBERT, Ensemble Voting

Abstract

The rapid development in biotechnology and medical research has resulted in a large amount of scientific literature containing critical information about various medical entities. However, the primary challenge in managing this data is the vast volume of unstructured text, which requires Natural Language Processing (NLP) techniques for automatic information extraction. One of the main applications in NLP is Named Entity Recognition (NER), which aims to identify important entities in the text, such as disease names, drugs, and proteins. This study aims to enhance the performance of medical Named Entity Recognition (NER) by applying ensemble Voting to three BERT-based models: BioBERT, TinyBERT, and ClinicalBERT. The results show that the ensemble voting technique provides the best performance in medical entity extraction, with improvements in precision (0.9494), recall (0.9483), and F1-score (0.9488) compared to individual models, especially when handling less common medical entities. This approach is expected to contribute to the development of automated systems for analyzing and searching information in medical literature.

Downloads

Download data is not yet available.

References

[1] M. Triartama Manurung, G. Ngurah, L. Wijayakusuma, I. Putu, and W. Gautama, “Named Entity Recognition for Medical Records of Heart Failure Using a Pre-trained BERT Model,” 2025. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC

[2] J. Lee et al., “BioBERT: A pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, Feb. 2020, doi: 10.1093/bioinformatics/btz682.

[3] P. Su and K. Vijay-Shanker, “Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction,” BMC Bioinformatics, vol. 23, no. 1, Dec. 2022, doi: 10.1186/s12859-022-04642-w.

[4] X. Jiao et al., “TinyBERT: Distilling BERT for Natural Language Understanding,” Oct. 2020, [Online]. Available: http://arxiv.org/abs/1909.10351

[5] X. Jiao et al., “Findings of the Association for Computational Linguistics TinyBERT: Distilling BERT for Natural Language Understanding,” Nov. 2020.

[6] H. Yu et al., “An intent classification method for questions in ‘Treatise on Febrile diseases’ based on TinyBERT-CNN fusion model,” Comput Biol Med, vol. 162, Aug. 2023, doi: 10.1016/j.compbiomed.2023.107075.

[7] K. Huang, J. Altosaar, and R. Ranganath, “ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission,” Nov. 2020, [Online]. Available: http://arxiv.org/abs/1904.05342

[8] B. Yan and M. Pei, “Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation,” 2022. [Online]. Available: www.aaai.org

[9] B. Yan and M. Pei, “Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation,” 2022. [Online]. Available: www.aaai.org

[10] Y. Jia, H. Wang, Z. Yuan, L. Zhu, and Z. L. Xiang, “Biomedical relation extraction method based on ensemble learning and attention mechanism,” BMC Bioinformatics, vol. 25, no. 1, p. 333, Dec. 2024, doi: 10.1186/s12859-024-05951-y.

[11] A. M. Bamhdi, I. Abrar, and F. Masoodi, “An ensemble based approach for effective intrusion detection using majority voting,” Telkomnika (Telecommunication Computing Electronics and Control), vol. 19, no. 2, pp. 664–671, Apr. 2021, doi: 10.12928/TELKOMNIKA.v19i2.18325.

[12] M. A. Naji, S. El Filali, M. Bouhlal, E. H. Benlahmar, R. A. Abdelouhahid, and O. Debauche, “Breast Cancer Prediction and Diagnosis through a New Approach based on Majority Voting Ensemble Classifier,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 481–486. doi: 10.1016/j.procs.2021.07.061.

[13] M. A. Ganaie, M. Hu, A. K. Malik, M. Tanveer, and P. N. Suganthan, “Ensemble deep learning: A review,” Apr. 2021, doi: 10.1016/j.engappai.2022.105151.

[14] S. Masoumi, H. Amirkhani, N. Sadeghian, and S. Shahraz, “Natural language processing (NLP) to facilitate abstract review in medical research: the application of BioBERT to exploring the 20-year use of NLP in medical research,” Syst Rev, vol. 13, no. 1, Dec. 2024, doi: 10.1186/s13643-024-02470-y.

[15] Z. Li et al., “Ensemble pretrained language models to extract biomedical knowledge from literature,” Journal of the American Medical Informatics Association, vol. 31, no. 9, pp. 1904–1911, Sep. 2024, doi: 10.1093/jamia/ocae061.

[16] T. Meenachisundaram and M. Dhanabalachandran, “Biomedical Named Entity Recognition Using the SVM Methodologies and bio Tagging Schemes,” 2021, doi: 10.37358/Rev.Chim.1949.

[17] Q. Zhang, Y. Sun, L. Zhang, Y. Jiao, and Y. Tian, “Named entity recognition method in health preserving field based on BERT,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 212–220. doi: 10.1016/j.procs.2021.03.010.

[18] A. C. Mazari, N. Boudoukhani, and A. Djeffal, “BERT-based ensemble learning for multi-aspect hate speech detection,” Cluster Comput, vol. 27, no. 1, pp. 325–339, Feb. 2024, doi: 10.1007/s10586-022-03956-x.

[19] C. D. Nafanda and A. Salam, “Optimalisasi Model BioBERT untuk Pengenalan Entitas pada Teks Medis dengan Conditional Random Fields (CRF),” Technology and Science (BITS), vol. 6, no. 4, 2025, doi: 10.47065/bits.v6i4.7042.

[20] M. Fadli and R. A. Saputra, “Klasifikasi Dan Evaluasi Performa Model Random Forest Untuk Prediksi Stroke Classification And Evaluation Of Performance Models Random Forest For Stroke Prediction,” Jurnal Teknik, vol. 12, Oct. 2023, [Online]. Available: http://jurnal.umt.ac.id/index.php/jt/index

Downloads

Published

2025-06-20

How to Cite

[1]
F. F. Maulana and A. Salam, “Enhancing Medical Named Entity Recognition with Ensemble Voting of BERT-Based Models on BC5CDR”, JAIC, vol. 9, no. 3, pp. 989–997, Jun. 2025.

Issue

Section

Articles