Real-Time News Authenticity Verification Using MPNet (Masked and Permuted Pre-training Network)-Based Sentence Embeddings on Digital News Portals

Ira Lestari; Herlinah Herlinah; M. Adnan Nur

doi:10.30871/jaic.v10i3.13193

Authors

Ira Lestari Universitas Handayani Makassar
Herlinah Herlinah Universitas Handayani Makassar
M. Adnan Nur Universitas Handayani Makassar

DOI:

https://doi.org/10.30871/jaic.v10i3.13193

Keywords:

Hoax Detection, MPNet, News Verification, Semantic Similarity

Abstract

The dissemination of fake news (hoaxes) on digital news portals represents a significant challenge in the digital era, as it may mislead the public and reduce trust in circulating information. The rapid and open nature of digital media enables unverified information to spread widely within a short period of time, while manual verification processes require substantial time and effort. This study proposes a semantic similarity-based approach to support real-time news verification using the Multilingual MPNet model. The proposed approach utilizes content text as input, followed by keyword extraction using KeyBERT to represent the core information of the news. The extracted keywords are employed in a news scraping process to obtain comparative news articles from digital news portals. A dataset consisting of 200 Indonesian news articles, including 100 factual news articles and 100 hoax news articles, was used for evaluation. Subsequently, semantic similarity measurement is conducted to evaluate the degree of semantic relevance between the test news and the scraped news. Evaluation metrics were applied to assess the effectiveness of the proposed approach. The findings demonstrate that semantic text representation using Multilingual MPNet effectively supports hoax detection and provides relevant supporting evidence in the form of semantically related news articles, enabling users to access comparative news sources that support the verification process. Experimental results show that the proposed approach achieved an accuracy of 83.5%, precision of 97.18%, recall of 69.0%, F1-score of 80.70%, and an AUC of 0.695, indicating that Multilingual MPNet can effectively support news verification through semantic similarity analysis.

Downloads

Download data is not yet available.

References

[1] L. Triyono, R. Gernowo, M. Rahaman, and T. R. Yudantoro, “International Journal On Informatics Visualization journal homepage : www.joiv.org/index.php/joiv International Journal On Informatics Visualization Indonesian Fake News Detection Using Various Machine Learning Technique,” Sep. 2023. [Online]. Available: www.joiv.org/index.php/joiv

[2] V. Prisscilya and A. S. Girsang, “Classification of Indonesia False News Detection Using Bertopic and Indobert,” Jurnal Indonesia Sosial Teknologi, vol. 5, no. 8, 2024, [Online]. Available: http://jist.publikasiindonesia.id/

[3] E. Effendi, “User behaviour and hoax information on social media case of Indonesia,” Jurnal Studi Komunikasi (Indonesian Journal of Communications Studies), vol. 7, no. 3, pp. 930–943, Nov. 2023, doi: 10.25139/jsk.v7i3.7402.

[4] M. Dicky Desriansyah and I. Utna Sari, “Analisis Efektivitas Algoritma Machine Learning dalam Deteksi Hoaks: Pada Berita Digital Berbahasa Indonesia,” JISKA: Jurnal Sistem Informasi Dan Informatika, vol. 3, no. 2, p. 63, 2025, [Online]. Available: http://jurnal.unidha.ac.id/index.php/jiska

[5] A. Mu, amar Wahid, K. Adi Nugroho, T. Safitri, and F. Setyo Utomo, “Optimasi Logistic Regression dan Random Forest untuk Deteksi Berita Hoax Berbasis TF-IDF,” Jurnal Pendidikan dan Teknologi Indonesia (JPTI), vol. 4, no. 8, pp. 381–392, 2024, doi: 10.52436/1.jpti.602.

[6] L. A. Pekandi, R. G. Widjaja, A. Ananta, J. Harefa, and K. Jingga, “Evaluating IndoBERT for Indonesian Hoax News Detection: A Comparative Study with Ensemble and CNN-LSTM Models,” in Procedia Computer Science, Elsevier B.V., 2025, pp. 1625–1633. doi: 10.1016/j.procs.2025.09.105.

[7] A. Fardhina, R. M. Siregar, M. R. W. Br Sibarani, I. C. Br Ginting, and A. Pratama, “Sistem Deteksi Berita Hoaks berbasis Algoritma Natural Language Processing (NLP) menggunakan BERT,” Jurnal Manajemen Informatika, Sistem Informasi dan Teknologi Komputer (JUMISTIK), vol. 4, no. 1, pp. 450–461, Jun. 2025, doi: 10.70247/jumistik.v4i1.156.

[8] C. J. L. Tobing, IGN Lanang Wijayakusuma, and Luh Putu Ida Harini, “Perbandingan Kinerja IndoBERT dan MBERT Untuk Deteksi Berita Hoaks Politik dalam Bahasa Indonesia,” JST (Jurnal Sains dan Teknologi), vol. 14, no. 1, pp. 114–123, May 2025, doi: 10.23887/jstundiksha.v14i1.92126.

[9] K. Song, X. Tan, T. Qin, J. Lu, and T.-Y. Liu, “MPNet: Masked and Permuted Pre-training for Language Understanding,” Nov. 2020, [Online]. Available: http://arxiv.org/abs/2004.09297

[10] Pulung Hendro Prastyo, Eddy Tungadi, and Shaifudin Zuhdi, “Indonesian Automated Essay Scoring: A Comparative Study of Pretrained Transformer Models,” Information Technology Education Journal, pp. 120–130, Jun. 2025, doi: 10.59562/intec.v4i2.8069.

[11] Z. Guo, M. Schlichtkrull, and A. Vlachos, “A Survey on Automated Fact-Checking”, doi: 10.1162/tacl.

[12] I. Ali, M. N. Bin Ayub, P. Shivakumara, and N. F. B. M. Noor, “Fake News Detection Techniques on Social Media: A Survey,” 2022, Hindawi Limited. doi: 10.1155/2022/6072084.

[13] W. Mu and K. H. Lim, “Modelling Text Similarity: A Survey,” in Proceedings of the 2023 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2023, Association for Computing Machinery, Inc, Nov. 2023, pp. 698–705. doi: 10.1145/3625007.3627305.

[14] S. Fitria, N. Azizah, H. D. Cahyono, S. W. Sihwi, and W. Widiarto, “Performance Analysis of Transformer Based Models (BERT, ALBERT and RoBERTa) in Fake News Detection.” [Online]. Available: https://github.com/Shafna81/fakenewsdetection.git

[15] Ö. Sevgili, I. Nikishina, S. M. Yimam, M. Semmann, and C. Biemann, “UHH at AVeriTeC: RAG for Fact-Checking with Real-World Claims,” 2024. [Online]. Available: https://fever.ai/task.html

[16] A. R. Hanum et al., “Analisis Kinerja Algoritma Klasifikasi Teks Bert Dalam Mendeteksi Berita Hoaks,” vol. 11, no. 3, pp. 537–546, 2024, doi: 10.25126/jtiik938093.

[17] OpenAI et al., “GPT-4 Technical Report,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2303.08774

[18] Z. H. Amur, Y. K. Hooi, G. M. Soomro, H. Bhanbhro, S. Karyem, and N. Sohu, “Unlocking the Potential of Keyword Extraction: The Need for Access to High-Quality Datasets,” 2023, doi: 10.3390/app.

[19] Y. A. Hafiz and E. Sudarmilah, “Implementasi Web Scraping Pada Portal Berita Online.”

[20] S. Sannigrahi, J. van Genabith, and C. Espana-Bonet, “Are the Best Multilingual Document Embeddings simply Based on Sentence Embeddings?,” Apr. 2023, [Online]. Available: http://arxiv.org/abs/2304.14796

[21] A. Pannadhika Putra, D. Purnami Singgih Putri, and Aak. Cahyawan Wiranatha, “Scientific Paper Recommendation System: Application of Sentence Transformers and Cosine Similarity Using arXiv Data,” 2025. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC

[22] M. Abdul, H. Fathuddin, E. Prakarsa Mandyartha, and A. L. Nurlaili, “Penerapan Sentence-Bert dan Cosine Similarity untuk Pencarian Semantik Dokumen Skripsi dalam Format PDF,” R2J, vol. 8, no. 1, 2025, doi: 10.38035/rrj.v8i1.

[23] M. T. Colangelo, M. Meleti, S. Guizzardi, E. Calciolari, and C. Galli, “A Comparative Analysis of Sentence Transformer Models for Automated Journal Recommendation Using PubMed Metadata,” Big Data and Cognitive Computing, vol. 9, no. 3, Mar. 2025, doi: 10.3390/bdcc9030067.

[24] M. Siino, “All-MPNet at SemEval-2024 Task 1: Application of MPNet for Evaluating Semantic Textual Relatedness.” [Online]. Available: https://semantic-textual-relatedness.github.

[25] N. Muennighoff, “SGPT: GPT Sentence Embeddings for Semantic Search,” Aug. 2022, [Online]. Available: http://arxiv.org/abs/2202.08904

[26] Z. H. Amur, Y. Kwang Hooi, H. Bhanbhro, K. Dahri, and G. M. Soomro, “Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future Perspectives,” Mar. 01, 2023, MDPI. doi: 10.3390/app13063911.

[27] A. Awalina, J. Fawaid, R. Yunus Krisnabayu, and N. Yudistira, “Indonesia’s Fake News Detection using Transformer Network.” [Online]. Available: https://github.com/JibranFawaid/turnbackhoax-dataset.

[28] N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, and I. Gurevych, “BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models,” Oct. 2021, [Online]. Available: http://arxiv.org/abs/2104.08663

Real-Time News Authenticity Verification Using MPNet (Masked and Permuted Pre-training Network)-Based Sentence Embeddings on Digital News Portals

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

submit

tools

issn