Temporal Deep Learning for Probabilistic Mutation Forecasting in SARS-CoV-2 Spike Protein Sequences
DOI:
https://doi.org/10.30871/jaic.v10i3.12650Keywords:
Deep Learning, Long Short-Term Memory, Mutation Forecasting, SARS-CoV-2, Spike Protein, Temporal Sequence ModellingAbstract
Modelling the evolution of biological sequences under temporal and probabilistic constraints remains a complex computational challenge. This study investigates longitudinal deep learning for probabilistic modelling of mutation patterns in the SARS-CoV-2 Spike Protein. A stacked Long Short-Term Memory (LSTM) network is trained on temporally ordered amino acid sequences to estimate residue-level substitution probabilities and rank plausible future mutations. Unlike deterministic classification approaches, the proposed framework treats mutation prediction as a probabilistic ranking task, accounting for the inherent uncertainty of viral evolution. The model is evaluated using metrics suitable for imbalanced sequence data, including Top-K accuracy, precision, recall, F1-score, and ROC-AUC. Results indicate strong ranking performance, with Top-3 accuracy of 94.6% and ROC-AUC of 0.91. In comparison, the overall accuracy (93.1%) is interpreted cautiously, given the dominance of conserved residues. Error analysis shows that difficult predictions are concentrated in low-frequency, rapidly evolving residue positions. A comparison with a frequency-based baseline demonstrates that the LSTM captures temporal dependencies beyond static substitution patterns. Predicted mutation distributions exhibit a structured alignment with known functional regions of the Spike Protein, as supported by the established literature, providing qualitative biological validation. This study contributes a temporally structured and probabilistic framework for mutation modelling, emphasising ranking-based evaluation and biologically contextualised interpretation. The findings demonstrate the feasibility of probabilistic mutation forecasting under controlled experimental conditions and provide a methodological foundation for future research on AI-assisted genomic surveillance.
Downloads
References
[1] C. Fuyana, B. Ndlovu, S. Dube, K. Maguraushe, and L. Malungana, “Optimizing HIV Care Through Machine Learning-Assisted Prediction and Personalized Treatment,” in Smart Innovation, Systems and Technologies, 2025, vol. 436, pp. 149–161, doi: 10.1007/978-981-96-2124-8_11.
[2] O. R. Shahin et al., “Predicting genetic evolution of viruses to identify suitable vaccines using artificial intelligence,” Sci. Rep., vol. 16, no. 1, pp. 1–24, 2026, doi: 10.1038/s41598-026-35143-y.
[3] B. Michael, H. Veerasami, and N. Jayaprakash, “Artificial intelligence and big data for decoding infectious disease transmission dynamics and outbreak prediction,” Decod. Infect. Transm., vol. 4, no. December 2025, p. 100079, 2026, doi: 10.1016/j.dcit.2026.100079.
[4] L. van Dorp et al., “Emergence of genomic diversity and recurrent mutations in SARS-CoV-2,” Infect. Genet. Evol., vol. 83, no. April, p. 104351, 2020, doi: 10.1016/j.meegid.2020.104351.
[5] J. Li, S. Lai, G. F. Gao, and W. Shi, “The emergence, genomic diversity and global spread of SARS-CoV-2,” Nature, vol. 600, no. 7889, pp. 408–418, 2021, doi: 10.1038/s41586-021-04188-6.
[6] A. M. Carabelli et al., “SARS-CoV-2 variant biology: immune escape, transmission and fitness,” Nat. Rev. Microbiol., vol. 21, no. 3, pp. 162–177, 2023, doi: 10.1038/s41579-022-00841-7.
[7] N. Ndlovu and B. Ndlovu, “Explainable Artificial Intelligence in Multimodal Malaria Prediction : A Systematic Review and Roadmap Integrating Climate Change , Parasite Genomics , and Public Health Decision Support,” J. Appl. Informatics Comput., vol. 10, no. 2, 2026, doi: 10.30871/jaic.v10i2.12347.
[8] M. C. Maher et al., “Predicting the mutational drivers of future SARS-CoV-2 variants of concern,” Sci. Transl. Med., vol. 14, no. 633, pp. 0–11, 2022, doi: 10.1126/scitranslmed.abk3445.
[9] X. Liu et al., “Generative prediction of real-world prevalent SARS-CoV-2 mutation with in silico virus evolution,” Brief. Bioinform., vol. 26, no. 3, 2025, doi: 10.1093/bib/bbaf276.
[10] S. Sah, B. Surendiran, R. Dhanalakshmi, and S. N. Mohanty, “Mutation prediction and phylogenetic analysis of SARS-CoV2 protein sequences using LSTM based encoder-decoder model,” Arab J. Basic Appl. Sci., vol. 30, no. 1, pp. 103–121, 2023, doi: 10.1080/25765299.2023.2188677.
[11] S. King et al., “Forecasting SARS-CoV-2 spike protein evolution from small data by deep learning and regression,” Front. Syst. Biol., vol. 4, no. April, pp. 1–13, 2024, doi: 10.3389/fsysb.2024.1284668.
[12] W. J. Choi, J. Park, D. Y. Seong, D. S. Chung, and D. Hong, “A prediction of mutations in infectious viruses using artificial intelligence,” Genomics and Informatics, vol. 22, no. 1, 2024, doi: 10.1186/s44342-024-00019-y.
[13] E. Ma et al., “A predictive language model for SARS-CoV-2 evolution,” Signal Transduct. Target. Ther., vol. 9, no. 1, 2024, doi: 10.1038/s41392-024-02066-x.
[14] Y. Wu, S. Xu, S.-T. Yau, and Y. Wu, “PhyloTransformer: A Discriminative Model for Mutation Prediction Based on a Multi-head Self-attention Mechanism,” no. October, pp. 1–12, 2021.
[15] J. Vig, A. Madani, L. R. Varshney, C. Xiong, R. Socher, and N. F. Rajani, “Bertology Meets Biology: Interpreting Attention in Protein Language Models,” ICLR 2021 - 9th Int. Conf. Learn. Represent., 2021.
[16] H. Ahmadi, V. Nikoofard, H. Nikoofard, R. Abdolvahab, N. Nikoofard, and M. Esmaeilzadeh, “Prediction of SARS-CoV-2 spike protein mutations using Sequence-to-Sequence and Transformer models,” bioRxiv, p. 2023.01.23.525130, 2023.
[17] B. A. Sokhansanj, Z. Zhao, and G. L. Rosen, “Interpretable and Predictive Deep Neural Network Modeling of the SARS-CoV-2 Spike Protein Sequence to Predict COVID-19 Disease Severity,” Biology (Basel)., vol. 11, no. 12, pp. 1–44, 2022, doi: 10.3390/biology11121786.
[18] S. Tasnim, K. H. Talukder, and A. Asfi, “Next Mutation Prediction of Sars-Cov-2 Spike Protein Sequence Using Encoder-Decoder Based Long Short Term Memory (Lstm) Method,” Khulna Univ. Stud., vol. 2, no. March 2020, pp. 803–816, 2022, doi: 10.53808/kus.2022.icstem4ir.0142-se.
[19] F. Durazzi, M. P. G. Koopmans, R. A. M. Fouchier, and D. Remondini, “Language models learn to represent antigenic properties of human influenza A(H3) virus,” Sci. Rep., vol. 15, no. 1, pp. 1–10, 2025, doi: 10.1038/s41598-025-03275-2.
[20] W. Tang, J. Kim, R. T. Lee, S. Maurer-Stroh, L. Renia, and M. Z. Tay, “SARS-CoV-2: lessons in virus mutation prediction and pandemic preparedness,” Curr. Opin. Immunol., vol. 95, p. 102560, 2025, doi: 10.1016/j.coi.2025.102560.
[21] X. Dong et al., “Variation around the dominant viral genome sequence contributes to viral load and outcome in patients with Ebola virus disease,” Genome Biol., vol. 21, no. 1, pp. 1–20, 2020, doi: 10.1186/s13059-020-02148-3.
[22] P. Mistry et al., “SARS-CoV-2 Variants, Vaccines, and Host Immunity,” Front. Immunol., vol. 12, no. January, pp. 1–21, 2022, doi: 10.3389/fimmu.2021.809244.
[23] C. Cheohen, V. M. S. Gomes, and M. L. da Silva, “CNN-LSTM Hybrid Model for AI-Driven Prediction of COVID-19 Severity from Spike Sequences and Clinical Data,” 2025.
[24] I. C. dos Santos, R. di S. de Souza, I. Tolstoy, L. S. Oliveira, and A. Gruber, “Integrating Sequence- and Structure-Based Similarity Metrics for the Demarcation of Multiple Viral Taxonomic Levels,” Viruses, vol. 17, no. 5, pp. 1–27, 2025, doi: 10.3390/v17050642.
[25] L. Gavotte and R. Frutos, “The stochastic world of emerging viruses,” Res. Rep., no. September, pp. 1–8, 2025, doi: 10.1093/pnasnexus/pgac185.
[26] B. D. Trump, S. Galaitsi, J. Cegan, and I. Linkov, “How Will AI Shape the Future of Pandemic Response? Early Clues From Data Analytics,” Risk Anal., vol. 45, no. 12, pp. 4544–4556, 2025, doi: 10.1111/risa.70103.
[27] W. Liu, “Bracing the artificial intelligence technology in viral infectious disease control,” Infectious medicine, vol. 4, no. 2. State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Science, Beijing 100071, China., p. 100186, 2025, doi: 10.1016/j.imj.2025.100186.
[28] A. M. Save et al., “Leveraging Machine Learning to Enhance Public Health Outcomes : A Comprehensive Approach to Disease Prediction , Prevention and Management,” vol. XXVI, pp. 900–921, 2025.
[29] J. Machemedze and B. Ndlovu, “Transformer-Based Models for Electronic Health Records and Omics in Healthcare : A Systematic Literature Review,” J. Appl. Informatics Comput., vol. 10, no. 1, pp. 90–105, 2026, doi: 10.30871/jaic.v10i1.11893.
[30] O. T. Chikumo and B. Ndlovu, “Transformer-based Models for Cardiovascular Disease Predictions from Electronic Health Records : A Systematic Review,” J. Appl. Informatics Comput., vol. 10, no. 1, 2026, doi: 10.30871/jaic.v10i1.11899.
[31] S. S. Sibanda and B. Ndlovu, “Explainable Transformer and Machine Learning Models in Predicting Tuberculosis Treatment Outcomes . A Systematic Review,” J. Appl. Informatics Comput., vol. 10, no. 1, pp. 150–164, 2026, doi: 10.30871/jaic.v10i1.11846.
[32] M. Khan and Y. Hossni, “A comparative analysis of LSTM models aided with attention and squeeze and excitation blocks for activity recognition,” Sci. Rep., vol. 15, no. 1, pp. 1–20, 2025, doi: 10.1038/s41598-025-88378-6.
[33] M. Shahriyari, A. Safari, A. Quteishat, and H. Afsharirad, “An LSTM architecture for real-time multi-domain stability boundary prediction beyond post-fault dependency in power systems,” Sci. Rep., vol. 16, no. 1, pp. 1–21, 2026, doi: 10.1038/s41598-026-36571-6.
[34] M. Krichen and A. Mihoub, “Long Short-Term Memory Networks: A Comprehensive Survey,” AI, vol. 6, no. 9, pp. 1–21, 2025, doi: 10.3390/ai6090215.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Robert Selemani, Belinda Ndlovu, Amazing Maphosa

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








