How do scientists detect new variants of the virus that causes COVID-19? The answer is a process called DNA sequencing.
Researchers sequence DNA to determine the order of the four chemical building blocks, or nucleotides, that make it up: adenine, thymine, cytosine and guanine. The millions to billions of these building blocks paired up together collectively make up a genome that contains all the genetic information an organism needs to survive.
When an organism replicates, it makes a copy of its entire genome to pass on to its offspring. Sometimes errors in the copying process can lead to mutations in which one or more building blocks are swapped, deleted or inserted. This may alter genes, the instruction sheets for the proteins that allow an organism to function, and can ultimately affect the physical characteristics of that organism. In humans, for example, eye and hair color are the result of genetic variations that can arise from mutations. In the case of the virus that causes COVID-19, SARS-CoV-2, mutations can change its ability to spread, cause infection or even evade the immune system.
We are both biochemists and microbiologists who teach about and study the genomes of bacteria. We both use DNA sequencing in our research to understand how mutations affect antibiotic resistance. The tools we use to sequence DNA in our work are the same ones scientists are using right now to study the SARS-CoV-2 virus.
How are genomes sequenced?
One of the earliest methods scientists used in the 1970s and 1980s was Sanger sequencing, which involves cutting up DNA into short fragments and adding radioactive or fluorescent tags to identify each nucleotide. The fragments are then put through an electric sieve that sorts them by size. Compared with newer methods, Sanger sequencing is slow and can process only relatively short stretches of DNA. Despite these limitations, it provides highly accurate data, and some researchers are still actively using this method to sequence SARS-CoV-2 samples.
Since the late 1990s, next-generation sequencing has revolutionized how researchers collect data on and understand genomes. Known as NGS, these technologies are able to process much higher volumes of DNA at the same time, significantly reducing the amount of time it takes to sequence a genome.
There are two main types of NGS platforms: second-generation and third-generation sequencers.
Second-generation technologies are able to read DNA directly. After DNA is cut up into fragments, short stretches of genetic material called adapters are added to give each nucleotide a different color. For example, adenine is colored blue and cytosine is colored red. Finally, these DNA fragments are fed into a computer and reassembled into the entire genomic sequence.
Third-generation technologies like the Nanopore MinIon directly sequence DNA by passing the entire DNA molecule through an electrical pore in the sequencer. Because each pair of nucleotides disrupts the electrical current in a particular way, the sequencer can read these changes and upload them directly to a computer. This allows clinicians to sequence samples at point-of-care clinical and treatment facilities. However, Nanopore sequences smaller volumes of DNA compared with other NGS platforms.
Though each class of sequencer processes DNA in a different way, they can all report the millions or billions of building blocks that make up genomes in a short time – from a few hours to a few days. For example, the Illumina NovaSeq can sequence roughly 150 billion nucleotides, the equivalent of 48 human genomes, in just three days.
Using sequencing data to fight coronavirus
So why is genomic sequencing such an important tool in combating the spread of SARS-CoV-2?
Rapid public health responses to SARS-CoV-2 require intimate knowledge of how the virus is changing over time. Scientists have been using genome sequencing to track SARS-CoV-2 almost in real time since the start of the pandemic. Millions of individual SARS-CoV-2 genomes have been sequenced and housed in various public repositories like the Global Initiative on Sharing Avian Influenza Data and the National Center for Biotechnology Information.
Genomic surveillance has guided public health decisions as each new variant has emerged. For example, sequencing the genome of the omicron variant allowed researchers to detect over 30 mutations in the spike protein that allows the virus to bind to cells in the human body. This makes omicron a variant of concern, as these mutations are known to contribute to the virus’s ability to spread. Researchers are still learning about how these mutations might affect the severity of the infections omicron causes, and how well it’s able to evade current vaccines.
Sequencing also has helped researchers identify variants that spread to new regions. Upon receiving a SARS-CoV-2 sample collected from a traveler who returned from South Africa on Nov. 22, 2021, researchers at the University of California, San Francisco, were able to detect omicron’s presence in five hours and had nearly the entire genome sequenced in eight. Since then, the Centers for Disease Control and Prevention has been monitoring omicron’s spread and advising the government on ways to prevent widespread community transmission.
The rapid detection of omicron worldwide emphasizes the power of robust genomic surveillance and the value of sharing genomic data across the globe. Understanding the genetic makeup of the virus and its variants gives researchers and public health officials insights into how to best update public health guidelines and maximize resource allocation for vaccine and drug development. By providing essential information on how to curb the spread of new variants, genomic sequencing has saved and will continue to save countless lives over the course of the pandemic.
Andre Hudson, Professor and Head of the Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology and Crista Wadsworth, Assistant Professor in the Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology