Can an AI predict the language of the viral mutation?

Viruses drive a quite repetitive existence. He enters a cell, hijacks its equipment to turn it into a viral copier, and these children head to other cells armed with instructions to do the same. So it works, over and over again. But to some extent, in the midst of this repeated gluing, things get mixed up. Mutations occur in children. Sometimes a mutation means that an amino acid is not produced and a vital protein is not folded – so this viral version goes in the trash of evolutionary history. Sometimes the mutation does nothing because different sequences encoding the same proteins compensate for the error. But from time to time, the mutations work perfectly. The changes do not affect the ability of the virus to exist; Instead, they produce a useful change, such as the virus unrecognizable to a person’s immune defense. When this allows the virus to escape antibodies from previous infections or a vaccine, the mutant version of the virus is said to have “escaped”.

Scientists are always looking for signs of potential escape. This is true for SARS-CoV-2, as new strains emerge and scientists are investigating what genetic changes could mean for a long-term vaccine. (So ​​far, things are looking good.) This also confuses researchers studying the flu and HIV, who routinely evade our immune defenses. So, in an effort to see what might happen, researchers are creating hypothetical mutants in the lab and seeing if they can steal antibodies taken from recent patients or vaccine recipients. But the genetic code offers too many possibilities for testing every evolutionary branch, the virus could last over time. It’s a matter of keeping up.

Last winter, Brian Hie, a computer scientist at MIT and a fan of John Donne’s lyric poetry, was thinking about this issue when he began an analogy: What if we thought of viral sequences as we think of language? written? He argued that every viral sequence has a kind of grammar – a set of rules that you have to follow to be that particular virus. When mutations violate this grammar, the virus reaches an evolutionary deadlock. In virological terms, it lacks “fitness”. Like language, from the perspective of the immune system, it can be said that the sequence has a kind of semantics. There are some sequences that the immune system can interpret – and thus stop the virus with antibodies and other defenses – and some that it cannot. So a viral escape could be seen as a change that keeps the grammar of the sequence, but changes its meaning.

The analogy had a simple elegance, almost too simple. But for Hie it was also practical. In recent years, AI systems have become very good at modeling grammatical and semantic principles in human language. They do this by driving a system with data sets of billions of words, arranged in sentences and paragraphs, from which the system derives patterns. In this way, without being told specific rules, the system learns where the commas should go and how to structure a clause. It can also be said that he intuits the meaning of certain sequences – words and phrases – based on the many contexts in which they appear in the entire data set. There are models, to the end. This is how the most advanced language models, such as GPT-3 OpenAI, can learn to produce perfect grammatical prose, which manages to remain reasonable on the subject.

An advantage of this idea is that it is generalizable. For a machine learning model, a sequence is a sequence, whether arranged in sonnets or amino acids. According to Jeremy Howard, an AI researcher at the University of San Francisco and an expert in language modeling, applying such models to biological sequences can be fruitful. With enough data from, say, genetic sequences of viruses that are known to be infectious, the model will implicitly learn something about how infectious viruses are structured. “This model will have a lot of sophisticated and complex knowledge,” he says. He knew that was the case. Her graduate advisor, computer scientist Bonnie Berger, had previously done similar work with another member of her lab, using AI to predict patterns of protein folding.

.Source