NGS is not a three-headed monster. However, it can be a difficult concept to grasp—especially when you are getting started. There is a lot of new terminology, and a whole new world to discover: both in the lab bench and in interpreting your results.
It helps to start somewhere. So, let’s start!
Depth of Coverage
Depth of coverage is the number of reads of a given nucleotide in an experiment. Most NGS protocols start with a random fragmentation of the genome into short random fragments. These fragments are then sequenced and aligned. This alignment creates a longer contiguous sequence, by tiling of the short sequences. For tiling to be successful, you need different reads with significant overlaps, to align them with confidence. Please note the key-word: random. Because the fragmentation process is random, there is a technical need for a large number of fragments. You need to find sequences that overlap on flanking regions, so that we can tile them together. It’s almost like putting together a sequence puzzle.
Therefore, the more depth of coverage we get, the more significant overlaps we have to correctly align our sequence. This gives us robust results, with a better mapping quality.
High average read depth is also important for accuracy and confidence. Small sequencing errors occur, but are easily discarded with good coverage: correct reads outnumber these individual errors, and make them statistically irrelevant.
Which brings us to our next topic…
Deep sequencing is taking the concept of depth of coverage one step further. In some experiments, you need very high read depth to be absolutely certain of the sequence. This is especially important for heterogeneous samples, such as tumor samples, or mosaics. By upping the coverage, we will be sure to call a variant, even if it is only present in a small percentage of cells in our sample. We can also differentiate them from sequencing errors, as we have more reads to accurately make the distinction.
Let’s imagine we are analyzing a tumor sample: normal cell contamination is common in cancer samples. So, we assume that we have a population of cells with no mutations (normal cells) and a population of cells with mutations (tumor cells). We do not know for sure the ratio of each population in our sample. Therefore, maximum accuracy is very important. However, with deep sequencing we can call a variant on a population of cells comprising as little as 1% of the original sample.
With the high depth of coverage associated with deep sequencing, bioinformatic tools can also detect insertions and deletions (even larger ones that are not detected by Sanger sequencing, for example) by observing the reads, and understanding the differences in coverage. If there are many fewer reads, it may mean there is a deletion. On the other hand, many more may signify a duplication or an insertion.
Deep sequencing is a powerful tool, both in research and diagnostics, and it is essential to understand how important it can become—especially when analyzing very heterogeneous samples.