It was not long since the commercialization of NGS (a little more than ten years ago) that scientists went beyond the basics and got creative with the new technology to study much more than just the sequence of DNA. In this article we highlight some of the different NGS technologies and methods available out there.
Inside the cell, DNA is decorated with a huge variety of proteins that sit on it in predefined locations, altering its activity. Using highly specialized antibodies against a protein of interest, you can virtually handpick the regions in the genome that are bound to it, and then identifying these regions through sequencing. This process is called ChIP-seq. There are many other NGS applications that aim to address questions about the interaction of DNA with proteins, such as DNAse-seq, MNAse-seq and FAIRE-seq. A thorough discussion on these can be found here.
There are often situations where you have a sample containing many individual species that you wish to identify. In order to identify the different species making up this sample a specific genomic region is exclusively amplified and sequenced. For example, a common approach to identifying the microbial composition of an environmental sample (if de novo sequencing of unknown microbial species is not required) is by determining exactly which “flavors” of rRNA (ribosomal RNA) sequences are present in it, with each specific variant serving as the species’ fingerprint.
Another method is Barcode Analysis by Sequencing (Bar-seq). This method uses a specific gene (usually the mitochondrial cytochrome c oxidase 1 gene) which sequence is known in different species in order to identify individual species form a mixed sample. Alternatively, this method can be utilized when screening barcode-tagged libraries of mutant laboratory strains. Barcode tagged species have had a specific short DNA sequence inserted, and this method is common in mutant yeast strains libraries.
Studying Chromatin Conformation
One of the most astonishing but also notoriously difficult applications of NGS is Hi-C. In this method, a series of chemical reactions on the sample transform the DNA sequence beyond recognition. With the aid of powerful algorithms for the analysis, Hi-C identifies the exact positions on the DNA where chromosomes are making contact in the three-dimensional space inside the nucleus. The specific application offers a level of information that would have been impossible to predict by simply looking up the definition for NGS. Hi-C is one of the most sophisticated methods for studying chromosome conformation, but other methods are available including 4C and 5C (which are both limited to a specific location). Cell does a wonderful snapshot showing the various methods for analysing chromatin conformation.
Together with the four bases, DNA contains a naturally occurring modification, methylation, that is undetectable by simple DNA-seq. Methylation happens non-randomly and affects the function of DNA in ways we don’t yet fully understand. Using again the rules of chemistry to transform DNA sequence in predictable ways depending on where methylation is present, we are able to precisely map the specific bases on the DNA that are methylated. The method is called bisulfite sequencing. This method can come in two flavors: Whole Genome Bisulphite Sequencing (WGBS) looks at methylation across the entire genome whereas Reduced Representation Bisulphite Sequencing (RRBS) targets only CpG rich regions, which is where the majority of DNA methylation occurs.
The impeccable single base-resolution obtained by bisulfite-seq is definitely deserved for the difficulty of the method. The process is lengthy, renders most of the DNA sample unusable (over-treatment is a necessary evil to avoid false-positives), and very computationally intensive.
All the above reasons have led to the wide adoption of a variety of alternative, lower-resolution methods to probe DNA methylation.
The most commonly used alternative methods are based on the selective enrichment of the methylated DNA fragments, very similar to ChIP-seq. MeDIP-seq employs antibodies against DNA methylation, while MAP-seq uses a naturally occurring protein with a comparable affinity for DNA methylation. The resolution of both these methods is restricted by the average fragment size, which is generally not exceeding 150-300 nucleotides.
Turning the limitations of fragmentation into a useful read-out, HELP-seq uses fragment size information to deduce the location of methylation spots on the genome. The method cleverly uses two restriction enzymes that cut the DNA at exactly the same locations; only one is inhibited when methylation is present while the other is not. By comparing the two digestion patterns we can infer the methylation status of any specific position where the enzymes should cut. In this sense, HELP-seq is a single-nucleotide resolution assay. It is restricted however by whether a methylation hotspot also happens to be recognized by the restriction enzymes or not.
Although current NGS technology is exclusive to DNA, a simple step of reverse transcription at the beginning of the procedure (turning RNA into its mirror DNA version) instantly transforms the technology into a tool for RNA research too, generally referred to as RNA-seq.
RNA-seq has been extensively used for expression profiling, where the levels of the sequenced RNA are also quantified (for a more in-depth analysis on RNA level quantification see my previous article). If directional, RNA-seq allows the determination the orientation of transcription. Significant adaptations to the method allow finer measurements of the portion of total RNA that is being actively transcribed or translated (referred to as GRO-seq and Ribo-seq respectively).
Reflecting DNA’s de novo genome sequencing NGS application, de novo transcriptome sequencing is a process in which only the transcribed portion of an unknown genome is identified. This is helpful in determining genomic regions that correspond to genes (gene finding) but don’t carry the characteristic marks of an active gene.
Conveniently, genes are under high evolutionary pressure and they remain, for the biggest part, unchanged between closely related species. It is therefore not uncommon to use the sequenced genome of a closely related organism as a scaffold for these experiments; a practical solution that when applicable (not all species have a close relative’s genome sequenced), simplifies enormously the computational power required for the analysis of these results.
Like DNA, RNA molecules are bound by a big variety of proteins. These proteins can be enriched with antibodies and the bound RNA sequenced, in a process analogous to ChIP-seq, called RNA-Immunoprecipitation sequencing (RIP-seq). Even more targeted, Photoactivatable-Ribonucleoside-Enhanced Crosslinking and Immunoprecipitation (PAR-CLIP) is a powerful method that determines the exact positions on the RNA where the protein was making contact.
I will close this overview on available NGS applications with the one linked to a Nobel Prize. With the discovery of various types of short, and other non-coding RNA types (for which Fire and Mello received the Nobel Prize for Physiology or Medicine in 2006), NGS was the right technology at the right time. It offered freedom from the constrains of not knowing where these new types of genes are located in the genome, or which is the direction of their transcription. The short reads obtained by directional RNA-seq were exactly the assay needed to study these “exotic” RNA species en masse. Enrichment for small RNAs allows a more specific look at microRNAs in a method known as miRNA-seq. It is now known that there are multiple classes of small RNAs (including siRNA and piRNA) and this enrichment also allows analysis of these non-coding RNAs.
The goal of this article has not been to give an exhaustive account of all possible uses for NGS. In any case such pursuit would have been futile, as new applications of the technology are constantly under development. NGS is an buzzing hub for innovation that doesn’t show any signs of slowing down. In the last year researchers reported repurposing an Illumina GA IIx instrument to measure RNA-protein binding events directly. I am convinced that a few years from now sequencing instrumentation will be available in a raspberry Pi version.