The Cheminformatics and Nutrition research group of the University has designed a machine learning system that predicts recurrent mutations of coronaviruses, information that will allow new drugs to be developed
Viruses are infectious agents that require the living cells of a host to reproduce. When they infect a cell, they force its reproductive mechanisms to synthesize the genetic information of the virus itself. In the case of SARS-CoV-2, the instructions necessary for the reproduction process are contained in its nucleus in the form of ribonucleic acid (RNA). While human DNA has a double helix structure, RNA is made up of a single strand, which encodes information using four components: adenine, guanine, cytosine and uracil. When there are errors in the replication process – changes in the order in which these four bases appear – mutations appear. Although these disruptions in the RNA strands were thought to be completely random, research has found that some errors were more frequent than others. In particular, some enzymes – organic substances that catalyze chemical reactions – in the host tended to convert the cytosine in the virus’s RNA into uracil. In this context, the URV’s Cheminformatics and Nutrition research group, led by researchers Gerard Pujadas and Santi Garcia, has designed a machine learning system based on an artificial neural network that can predict virus mutations resulting from the genetic information coming into contact with certain enzymes in the host.
Once the evolution of the virus had been analyzed in terms of its mutations, URV PhD student Bryan Saldivar “trained” an artificial neural network with data from more than 800,000 virus genomes so that it could predict which recurrent mutations would emerge in the future. An artificial neural network is a computational machine learning system that connects multiple nodes called artificial neurons which, when trained to perform a particular task, can work together to process large volumes of data. These systems learn on their own and can create model of themselves to achieve the results sought by researchers.
Typically, the procedure consists of using part of the genome to create the network and reserve a sufficiently large part so that it can be tested and corrected if necessary. In this case, the team reserved four genes, one of which contains the information of the protein that enables the virus to enter and infect cells.
This system, which has never before been used to predict virus mutations, has enabled researchers to anticipate the recurring changes in the virus, catalyzed by the human body’s own enzymes. The system also identifies those parts of the virus that cannot change, since if they do, the infectious agent is unable to reproduce.
All this information should make it possible for researchers to design drugs and make them more effective at eliminating the virus, because the weaknesses detected can be exploited to make reproduction more difficult. “This research provides important information for the scientific community, and it has been made available for consultation,” explains Santi Garcia. He also believes that the methodology can be replicated in future pandemics, especially if they are caused by a coronavirus or a new variant of SARS-CoV-2.
Reference: Saldivar-Espinoza B, Macip G, Garcia-Segura P, Mestres-Truyol J, Puigbò P, Cereto-Massagué A, Pujadas G, Garcia-Vallve S. Prediction of Recurrent Mutations in SARS-CoV-2 Using Artificial Neural Networks. Int J Mol Sci. 2022 Nov 24;23(23):14683. doi: 10.3390/ijms232314683. https://www.mdpi.com/1422-0067/23/23/14683