Real-Time News Authenticity Verification Using MPNet (Masked and Permuted Pre-training Network)-Based Sentence Embeddings on Digital News Portals
DOI:
https://doi.org/10.30871/jaic.v10i3.13193Keywords:
Hoax Detection, MPNet, News Verification, Semantic SimilarityAbstract
The dissemination of fake news (hoaxes) on digital news portals represents a significant challenge in the digital era, as it may mislead the public and reduce trust in circulating information. The rapid and open nature of digital media enables unverified information to spread widely within a short period of time, while manual verification processes require substantial time and effort. This study proposes a semantic similarity-based approach to support real-time news verification using the Multilingual MPNet model. The proposed approach utilizes content text as input, followed by keyword extraction using KeyBERT to represent the core information of the news. The extracted keywords are employed in a news scraping process to obtain comparative news articles from digital news portals. A dataset consisting of 200 Indonesian news articles, including 100 factual news articles and 100 hoax news articles, was used for evaluation. Subsequently, semantic similarity measurement is conducted to evaluate the degree of semantic relevance between the test news and the scraped news. Evaluation metrics were applied to assess the effectiveness of the proposed approach. The findings demonstrate that semantic text representation using Multilingual MPNet effectively supports hoax detection and provides relevant supporting evidence in the form of semantically related news articles, enabling users to access comparative news sources that support the verification process. Experimental results show that the proposed approach achieved an accuracy of 83.5%, precision of 97.18%, recall of 69.0%, F1-score of 80.70%, and an AUC of 0.695, indicating that Multilingual MPNet can effectively support news verification through semantic similarity analysis.
Downloads
References
[1] L. Triyono, R. Gernowo, M. Rahaman, and T. R. Yudantoro, “International Journal On Informatics Visualization journal homepage : www.joiv.org/index.php/joiv International Journal On Informatics Visualization Indonesian Fake News Detection Using Various Machine Learning Technique,” Sep. 2023. [Online]. Available: www.joiv.org/index.php/joiv
[2] V. Prisscilya and A. S. Girsang, “Classification of Indonesia False News Detection Using Bertopic and Indobert,” Jurnal Indonesia Sosial Teknologi, vol. 5, no. 8, 2024, [Online]. Available: http://jist.publikasiindonesia.id/
[3] E. Effendi, “User behaviour and hoax information on social media case of Indonesia,” Jurnal Studi Komunikasi (Indonesian Journal of Communications Studies), vol. 7, no. 3, pp. 930–943, Nov. 2023, doi: 10.25139/jsk.v7i3.7402.
[4] M. Dicky Desriansyah and I. Utna Sari, “Analisis Efektivitas Algoritma Machine Learning dalam Deteksi Hoaks: Pada Berita Digital Berbahasa Indonesia,” JISKA: Jurnal Sistem Informasi Dan Informatika, vol. 3, no. 2, p. 63, 2025, [Online]. Available: http://jurnal.unidha.ac.id/index.php/jiska
[5] A. Mu, amar Wahid, K. Adi Nugroho, T. Safitri, and F. Setyo Utomo, “Optimasi Logistic Regression dan Random Forest untuk Deteksi Berita Hoax Berbasis TF-IDF,” Jurnal Pendidikan dan Teknologi Indonesia (JPTI), vol. 4, no. 8, pp. 381–392, 2024, doi: 10.52436/1.jpti.602.
[6] L. A. Pekandi, R. G. Widjaja, A. Ananta, J. Harefa, and K. Jingga, “Evaluating IndoBERT for Indonesian Hoax News Detection: A Comparative Study with Ensemble and CNN-LSTM Models,” in Procedia Computer Science, Elsevier B.V., 2025, pp. 1625–1633. doi: 10.1016/j.procs.2025.09.105.
[7] A. Fardhina, R. M. Siregar, M. R. W. Br Sibarani, I. C. Br Ginting, and A. Pratama, “Sistem Deteksi Berita Hoaks berbasis Algoritma Natural Language Processing (NLP) menggunakan BERT,” Jurnal Manajemen Informatika, Sistem Informasi dan Teknologi Komputer (JUMISTIK), vol. 4, no. 1, pp. 450–461, Jun. 2025, doi: 10.70247/jumistik.v4i1.156.
[8] C. J. L. Tobing, IGN Lanang Wijayakusuma, and Luh Putu Ida Harini, “Perbandingan Kinerja IndoBERT dan MBERT Untuk Deteksi Berita Hoaks Politik dalam Bahasa Indonesia,” JST (Jurnal Sains dan Teknologi), vol. 14, no. 1, pp. 114–123, May 2025, doi: 10.23887/jstundiksha.v14i1.92126.
[9] K. Song, X. Tan, T. Qin, J. Lu, and T.-Y. Liu, “MPNet: Masked and Permuted Pre-training for Language Understanding,” Nov. 2020, [Online]. Available: http://arxiv.org/abs/2004.09297
[10] Pulung Hendro Prastyo, Eddy Tungadi, and Shaifudin Zuhdi, “Indonesian Automated Essay Scoring: A Comparative Study of Pretrained Transformer Models,” Information Technology Education Journal, pp. 120–130, Jun. 2025, doi: 10.59562/intec.v4i2.8069.
[11] Z. Guo, M. Schlichtkrull, and A. Vlachos, “A Survey on Automated Fact-Checking”, doi: 10.1162/tacl.
[12] I. Ali, M. N. Bin Ayub, P. Shivakumara, and N. F. B. M. Noor, “Fake News Detection Techniques on Social Media: A Survey,” 2022, Hindawi Limited. doi: 10.1155/2022/6072084.
[13] W. Mu and K. H. Lim, “Modelling Text Similarity: A Survey,” in Proceedings of the 2023 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2023, Association for Computing Machinery, Inc, Nov. 2023, pp. 698–705. doi: 10.1145/3625007.3627305.
[14] S. Fitria, N. Azizah, H. D. Cahyono, S. W. Sihwi, and W. Widiarto, “Performance Analysis of Transformer Based Models (BERT, ALBERT and RoBERTa) in Fake News Detection.” [Online]. Available: https://github.com/Shafna81/fakenewsdetection.git
[15] Ö. Sevgili, I. Nikishina, S. M. Yimam, M. Semmann, and C. Biemann, “UHH at AVeriTeC: RAG for Fact-Checking with Real-World Claims,” 2024. [Online]. Available: https://fever.ai/task.html
[16] A. R. Hanum et al., “Analisis Kinerja Algoritma Klasifikasi Teks Bert Dalam Mendeteksi Berita Hoaks,” vol. 11, no. 3, pp. 537–546, 2024, doi: 10.25126/jtiik938093.
[17] OpenAI et al., “GPT-4 Technical Report,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2303.08774
[18] Z. H. Amur, Y. K. Hooi, G. M. Soomro, H. Bhanbhro, S. Karyem, and N. Sohu, “Unlocking the Potential of Keyword Extraction: The Need for Access to High-Quality Datasets,” 2023, doi: 10.3390/app.
[19] Y. A. Hafiz and E. Sudarmilah, “Implementasi Web Scraping Pada Portal Berita Online.”
[20] S. Sannigrahi, J. van Genabith, and C. Espana-Bonet, “Are the Best Multilingual Document Embeddings simply Based on Sentence Embeddings?,” Apr. 2023, [Online]. Available: http://arxiv.org/abs/2304.14796
[21] A. Pannadhika Putra, D. Purnami Singgih Putri, and Aak. Cahyawan Wiranatha, “Scientific Paper Recommendation System: Application of Sentence Transformers and Cosine Similarity Using arXiv Data,” 2025. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
[22] M. Abdul, H. Fathuddin, E. Prakarsa Mandyartha, and A. L. Nurlaili, “Penerapan Sentence-Bert dan Cosine Similarity untuk Pencarian Semantik Dokumen Skripsi dalam Format PDF,” R2J, vol. 8, no. 1, 2025, doi: 10.38035/rrj.v8i1.
[23] M. T. Colangelo, M. Meleti, S. Guizzardi, E. Calciolari, and C. Galli, “A Comparative Analysis of Sentence Transformer Models for Automated Journal Recommendation Using PubMed Metadata,” Big Data and Cognitive Computing, vol. 9, no. 3, Mar. 2025, doi: 10.3390/bdcc9030067.
[24] M. Siino, “All-MPNet at SemEval-2024 Task 1: Application of MPNet for Evaluating Semantic Textual Relatedness.” [Online]. Available: https://semantic-textual-relatedness.github.
[25] N. Muennighoff, “SGPT: GPT Sentence Embeddings for Semantic Search,” Aug. 2022, [Online]. Available: http://arxiv.org/abs/2202.08904
[26] Z. H. Amur, Y. Kwang Hooi, H. Bhanbhro, K. Dahri, and G. M. Soomro, “Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future Perspectives,” Mar. 01, 2023, MDPI. doi: 10.3390/app13063911.
[27] A. Awalina, J. Fawaid, R. Yunus Krisnabayu, and N. Yudistira, “Indonesia’s Fake News Detection using Transformer Network.” [Online]. Available: https://github.com/JibranFawaid/turnbackhoax-dataset.
[28] N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, and I. Gurevych, “BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models,” Oct. 2021, [Online]. Available: http://arxiv.org/abs/2104.08663
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Ira Lestari, Herlinah Herlinah, M. Adnan Nur

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








