A Quick-Fire Guide to Shotgun Sequencing (and Assembly)
“Making the simple complicated is commonplace; making the complicated simple, awesomely simple, that’s creativity.”
– Charles Mingus
Next Generation Sequencing (NGS) technology has boomed in recent years, allowing researchers to probe further into the workings of the genome. According to the theory of simplicity, it is the simple principles on its basis that make the technology modular, and therefore scalable. A hub for innovation.
But what exactly is NGS and how can it help your research? In these next few articles we will be discussing what NGS is, the different techniques out there and how it can help you in your own research. In this one we will cover the basics of genome sequencing.
In its most basic definition, NGS (also known as DNA-seq) allows the determination of the base sequence on a DNA sample. In shot-gun sequencing DNA must be first broken up in small pieces, usually around 500–1000 bases long – a process called fragmentation, before being sequenced. Once sequenced these fragments have to then be assembled.
If the sample at hand consists of the never before sequenced genome of a new species, then we refer to the process as a de novo DNA sequencing experiment. If the goal, on the other hand, is to only detect changes in a previously sequenced genome (for example the causative mutations in people with a hereditary disease), the application is called whole genome re-sequencing.
In cases of genome re-sequencing, the already known sequence is used as a guide to build the genome up from the individually sequenced fragments, in a process of matching each sequenced base with its correct position in the genome, known as alignment. For de novo experiments computationally demanding algorithms are used to help assemble the fragments.
Contigs and Scaffolds
Much like a jigsaw puzzle without a picture to use as a guide, de novo sequencing experiments (or certain environmental microbial samples) have no reference genome to base an alignment on it. With the help of specialized algorithms, we are able to perform this type of de novo assembly, by utilizing shared overlapping portions of the shredded DNA fragments, in order to reconstruct longer lengths of DNA, commonly referred to as contigs. Contigs are continuous sequences of DNA with no gaps present and can further be assembled into scaffolds. Unlike contigs, where the bases are known to a high level of certainty, scaffolds can include regions of unknown bases between two contigs.
Ideally contigs will be assembled into a single scaffold representing one chromosome, but in reality gaps often occur between scaffolds and ambiguous repetitive patterns of DNA in the genome forbid further reconstruction of entire chromosomes. Another specialized assay, mate-pair sequencing, is designed to address that problem.
By this term we refer to the process of sequencing a DNA sample that has been previously enriched for protein-coding genes. For humans only about 10% of the genome encodes for proteins and sequencing just this small portion of the genome enables the detection of rare mutations. Like whole genome sequencing, exome sequences can be assembled de novo, although it is much easier and more common for exomes to be assembled against a reference genome.
This is just a very brief introduction to the basics of genome sequencing, but there is much, much more to NGS than this. Check out my next article where I detail some of the additional NGS techniques available that could help you in your own research.
Leave a Comment
You must be logged in to post a comment.