In light of the ongoing pandemic, researchers have stepped up efforts to identify new zoonotic threats—viruses that could make the leap from animals to humans and cause future epidemics. Studies aiming to identify such viruses often focus on a novel virus’ similarity to known zoonotic pathogens. But homing in on viruses that show signs of having adapted to human-like hosts might be a better approach, a recent study suggests.
Using machine learning, infectious disease ecologist Daniel Streicker and his colleagues at the University of Glasgow examined more than 800 viral genomes to understand what sequences best predicted a virus’ ability to infect people. While many previous studies have focused on sequencing novel viruses, there are few tools to apply that information to predict zoonotic risk. “There’s been this disconnect between viral discovery in animals and risk assessment to prevent outbreaks,” says biologist Colin Carlson of Georgetown University, who was not involved with the work. “I think this study solves a very big problem.”
Prior to the COVID-19 pandemic, there were two instances of zoonotic coronaviruses, MERS-CoV and SARS-CoV, that caused global public health crises. Even so, it’s difficult to discern which viruses pose an immediate risk to human health based only on the genomes or traits of viruses in bats or other animals. Relying on genome sequences to determine relatedness has serious drawbacks, however: Several branches on viral family trees are still undiscovered or incomplete. “There are huge gaps in our knowledge of the diversity of viruses out there,” Streicker says.
To find a solution, Streicker and his colleagues pursued a different line of inquiry, seeking out genomic sequences that could help reveal whether viruses bore signatures that resembled those of human proteins.
The team began with one member of each of 861 viral species. They split this dataset into two: one part used to develop and train a machine-learning model, and a second part to test their model. When training the model, the researchers added information on viruses’ relatedness to one another, as well as data on whether they were known to infect humans. Relatedness among viruses did not predict a novel virus’s ability to infect humans, at least in part because of the gaps in data on viral diversity, Streicker says.
So the team trained the model to also seek out sequences in viral genomes that were similar to human sequences. Since immune cells are on alert to spot unfamiliar genetic material, one way for a virus to evade those defenses—and thus infect a new host—is by mimicking host genetic patterns and hence appear harmless to surveilling cells. Viruses that already possess such patterns could have greater zoonotic potential, the authors suspected.
Adding this information vastly improved the model’s ability to spot potential pathogens. The model correctly identified 92% of viruses known to infect humans. It also identified 18 viruses that are not known to be zoonotic, although some studies suggest 3 of those 18 may have once infected humans. “I was surprised that this approach worked so well, that there’s enough information in a viral genome that we can make these predictions,” Streicker says.
The new model could augment ongoing viral surveillance efforts by helping to parse and prioritize which viruses need further study, says disease ecologist Barbara Han of the Cary Institute of Ecosystem Studies in Millbrook, New York, who was not involved with the work. “Advances like the one presented in this paper are really necessary to make sense of the sequences that are collected from those surveillance campaigns,” she says.
Still, the model does not account for many real-world variables, Streicker notes, such as the odds of humans encountering a virus with high zoonotic potential—a virus’ ability to transmit from one human to another (as opposed to simply jumping from nonhuman to human hosts)—or even its ability to cause disease. “These are the things we’d need to know to design interventions at public health scale,” he says.
Future studies could add this sort of information to refine the model and extend its applications. For now, the new work helps solve a “needle in haystack” type problem; researchers could apply the model to prioritize specific viral strains for experimental studies or identify strains that warrant closer surveillance. Such efforts are often time-consuming and expensive, so a tool that uses genomic research to inform those studies is extremely valuable, Carlson says. “What’s so great about a model like this is it tells you, for instance, which dozen or so viruses you need to look at in the next year,” he says. “It makes use of huge investments that have been made in basic research.”