Enhancing Medical Named Entity Recognition with Ensemble Voting of BERT-Based Models on BC5CDR
DOI:
https://doi.org/10.30871/jaic.v9i3.9549Keywords:
BC5CDR, BioBERT, TinyBERT, ClinicalBERT, Ensemble VotingAbstract
The rapid development in biotechnology and medical research has resulted in a large amount of scientific literature containing critical information about various medical entities. However, the primary challenge in managing this data is the vast volume of unstructured text, which requires Natural Language Processing (NLP) techniques for automatic information extraction. One of the main applications in NLP is Named Entity Recognition (NER), which aims to identify important entities in the text, such as disease names, drugs, and proteins. This study aims to enhance the performance of medical Named Entity Recognition (NER) by applying ensemble Voting to three BERT-based models: BioBERT, TinyBERT, and ClinicalBERT. The results show that the ensemble voting technique provides the best performance in medical entity extraction, with improvements in precision (0.9494), recall (0.9483), and F1-score (0.9488) compared to individual models, especially when handling less common medical entities. This approach is expected to contribute to the development of automated systems for analyzing and searching information in medical literature.
Downloads
References
[1] M. Triartama Manurung, G. Ngurah, L. Wijayakusuma, I. Putu, and W. Gautama, “Named Entity Recognition for Medical Records of Heart Failure Using a Pre-trained BERT Model,” 2025. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
[2] J. Lee et al., “BioBERT: A pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, Feb. 2020, doi: 10.1093/bioinformatics/btz682.
[3] P. Su and K. Vijay-Shanker, “Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction,” BMC Bioinformatics, vol. 23, no. 1, Dec. 2022, doi: 10.1186/s12859-022-04642-w.
[4] X. Jiao et al., “TinyBERT: Distilling BERT for Natural Language Understanding,” Oct. 2020, [Online]. Available: http://arxiv.org/abs/1909.10351
[5] X. Jiao et al., “Findings of the Association for Computational Linguistics TinyBERT: Distilling BERT for Natural Language Understanding,” Nov. 2020.
[6] H. Yu et al., “An intent classification method for questions in ‘Treatise on Febrile diseases’ based on TinyBERT-CNN fusion model,” Comput Biol Med, vol. 162, Aug. 2023, doi: 10.1016/j.compbiomed.2023.107075.
[7] K. Huang, J. Altosaar, and R. Ranganath, “ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission,” Nov. 2020, [Online]. Available: http://arxiv.org/abs/1904.05342
[8] B. Yan and M. Pei, “Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation,” 2022. [Online]. Available: www.aaai.org
[9] B. Yan and M. Pei, “Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation,” 2022. [Online]. Available: www.aaai.org
[10] Y. Jia, H. Wang, Z. Yuan, L. Zhu, and Z. L. Xiang, “Biomedical relation extraction method based on ensemble learning and attention mechanism,” BMC Bioinformatics, vol. 25, no. 1, p. 333, Dec. 2024, doi: 10.1186/s12859-024-05951-y.
[11] A. M. Bamhdi, I. Abrar, and F. Masoodi, “An ensemble based approach for effective intrusion detection using majority voting,” Telkomnika (Telecommunication Computing Electronics and Control), vol. 19, no. 2, pp. 664–671, Apr. 2021, doi: 10.12928/TELKOMNIKA.v19i2.18325.
[12] M. A. Naji, S. El Filali, M. Bouhlal, E. H. Benlahmar, R. A. Abdelouhahid, and O. Debauche, “Breast Cancer Prediction and Diagnosis through a New Approach based on Majority Voting Ensemble Classifier,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 481–486. doi: 10.1016/j.procs.2021.07.061.
[13] M. A. Ganaie, M. Hu, A. K. Malik, M. Tanveer, and P. N. Suganthan, “Ensemble deep learning: A review,” Apr. 2021, doi: 10.1016/j.engappai.2022.105151.
[14] S. Masoumi, H. Amirkhani, N. Sadeghian, and S. Shahraz, “Natural language processing (NLP) to facilitate abstract review in medical research: the application of BioBERT to exploring the 20-year use of NLP in medical research,” Syst Rev, vol. 13, no. 1, Dec. 2024, doi: 10.1186/s13643-024-02470-y.
[15] Z. Li et al., “Ensemble pretrained language models to extract biomedical knowledge from literature,” Journal of the American Medical Informatics Association, vol. 31, no. 9, pp. 1904–1911, Sep. 2024, doi: 10.1093/jamia/ocae061.
[16] T. Meenachisundaram and M. Dhanabalachandran, “Biomedical Named Entity Recognition Using the SVM Methodologies and bio Tagging Schemes,” 2021, doi: 10.37358/Rev.Chim.1949.
[17] Q. Zhang, Y. Sun, L. Zhang, Y. Jiao, and Y. Tian, “Named entity recognition method in health preserving field based on BERT,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 212–220. doi: 10.1016/j.procs.2021.03.010.
[18] A. C. Mazari, N. Boudoukhani, and A. Djeffal, “BERT-based ensemble learning for multi-aspect hate speech detection,” Cluster Comput, vol. 27, no. 1, pp. 325–339, Feb. 2024, doi: 10.1007/s10586-022-03956-x.
[19] C. D. Nafanda and A. Salam, “Optimalisasi Model BioBERT untuk Pengenalan Entitas pada Teks Medis dengan Conditional Random Fields (CRF),” Technology and Science (BITS), vol. 6, no. 4, 2025, doi: 10.47065/bits.v6i4.7042.
[20] M. Fadli and R. A. Saputra, “Klasifikasi Dan Evaluasi Performa Model Random Forest Untuk Prediksi Stroke Classification And Evaluation Of Performance Models Random Forest For Stroke Prediction,” Jurnal Teknik, vol. 12, Oct. 2023, [Online]. Available: http://jurnal.umt.ac.id/index.php/jt/index
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Fadhli Faqih Maulana, Abu Salam

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).