A genome – an organism’s genetic material – is essentially its instruction manual, which contains all the information needed to make and maintain it. Human genomes are made of double-stranded DNA and are written in its special code of four nucleotide base ‘letters’. Human genomes are over 3 billion base letters long. In contrast, a virus genome can either be made of DNA or of its close cousin RNA and is tiny. Coronaviruses are RNA viruses and the newly-discovered virus SARS-CoV-2 has a single short RNA strand that is just 30,000 letters long. These letters can be ‘read’ one by one, using a technique called sequencing.
Genomes identify viruses
If the new coronavirus’s sequence is found in a sample (usually taken from the nose or mouth) it will confirm the likelihood that a patient’s symptoms are those of COVID-19.
Virus genomes constantly alter (mutate), changing a few letters at a time as they divide and spread by infecting more people. These changes can be exploited to track the spread of the virus by sequencing, recording and analysing genomes.
If virus genome sequencing is undertaken rapidly and on a large-scale then it can assist epidemiologists and public health authorities in understanding how the virus is spreading and in evaluating how effective their interventions have been. It can also help to establish whether new variants are associated with particular patterns of symptoms or severity of disease. In the longer term, tracking new variants is likely to be extremely important to ensure that vaccines, when these are developed, can be kept ‘up to date’ with the strains of virus that are currently circulating.
Local transmission versus imported cases
In the initial stages of the epidemic, sequencing can be used to find out how many new cases of disease are imported or come from local transmission. Global databases of virus genomes enable researchers to compare genomes so that an accurate assessment of local transmission in each country can be made.
Mathematical models of how viruses evolve during an epidemic – developed from extensive analysis of past outbreaks – allow epidemic growth rates and other measurements of transmission and infection to be estimated from virus genome sequences. Compared to estimates from other sources of data, insights from virus genetics are most useful for the prediction of longer-term, larger-scale trends. Importantly, they provide independent validation of estimates of the size and growth rate of an epidemic. This is useful especially when cases are under-reported, for instance because many people who are infected do not have symptoms.
Spread in different places or groups
Widespread sampling and genome sequencing of the new coronavirus allows the reconstruction of virus spread in different places or groups of people. This provides information about what is driving the spread of the virus both locally and nationally. This work can be made more precise if virus genomes are combined with information about where, how and when people travel locally and internationally.
Virus genome sequences can also identify unique genetic changes shared by all those infected in a single virus transmission chain. This can be used to distinguish whether two clusters of cases in the same area have arisen because one started infection in the other, or because there were two distinct and independent chains of transmission with separate, earlier origins. Virus genomes can therefore add to the information provided by patient contact tracing, which is important for tracking outbreaks in communities, hospitals and other care settings.
Many genetic changes that occur in the genome of the virus will have no significant effect on the course of infection or disease, or the impact of control measures. However, a few of the changes might be important. These need to be identified and tracked through time. In viruses such as influenza, we know that genetic changes can alter how the immune system recognises viruses, resistance to antiviral drugs, and the severity of disease. These discoveries have yet to be made for the new coronavirus.
Rapid, large-scale virus genome sequencing is a new stream of information that can contribute to the tracking of epidemics and the development of new methods of control. Its application to the new coronavirus is only just beginning.