A paradigm shift by the Big Three
As we learned last week, the Human Genome Project was accomplished using the improved Sanger method and technology from Applied Biosystems (ABI). Despite the significant technical improvements to this ‘first-generation’ technology, sequencing multiple human genomes was never going to be easy without a paradigm shift.
Over the last five or six years at least three companies have done exactly this and between them provide the most widely used NGS systems; Illumina, Roche and Life Technologies. Other technologies have already fallen by the wayside (such as Helicos) and new systems are being developed for the next big leap forward. At the same time, there has also been significant progress in sample-prep technology for Whole Genome Sequencing (WGS), exome and amplicon studies.
But first- some basics
Almost all whole genome or exome sequencing is accomplished by first producing a fragment library, similar to shotgun cloning (but without the cloning part!). The different NGS platforms have slightly different library preparation methods, the major difference being ‘cluster generation’ for Illumina, and ‘emulsion PCR’ for Roche and Life Technologies.
The structure of a data set is determined by the question being asked
The format of a sequencing run is generally chosen before library preparation and there are a few commonly accepted defaults (e.g. Single-end 36 bp for ChIP-seq and Paired-end 100 bp for Cancer genomes). ChIP-seq studies require high numbers of short-reads so these are generated on short fast runs (see our previous article here). WGS requires as much actual genome coverage as possible so long-reads are used. RNA-seq and Structural-Variation-seq can use varying lengths of sequence read but both use paired-end data for most studies. Almost any DNA source can be used as a template including cDNA, meaning the number of methods is very large (almost 50).
NGS methods to suit every budget
Whole genomes: Tens of thousands of human genomes have now been sequenced, with the majority of these completed on the Illumina platform. The costs of WGS is now well under $10,000, and even as low as $2500 at some labs. Analysis methods are continually improving, but it still takes many hundreds or thousands of hours of computation to complete primary and secondary analysis.
Exomes: It is possible to sequence just the exons, which reduces the time and cost of experiments and allows a significant increase in sample numbers. Data analysis is potentially easier as well. Most exome sequencing is performed using in-solution capture. In this method biotinylated-oligonucleotide baits are mixed with sequencing libraries to pull-down only the exon regions for sequencing.
Amplicons: For many research questions, simply sequencing one or two exons in hundreds of samples is faster, uses little DNA, and the data analysis is simple. PCR amplification is well understood and has high specificity and sensitivity. Most users can simply design their own assays and it is theoretically possible to generate sequence data in one week. As potentially hundreds of samples can be multiplexed into a single NGS run, the cost per sample or per amplicon can be very low, less than $1 each.
A brief introduction to the main three technologies
(1) Roche: 454 Life Sciences Corp developed the first NGS technology and fundamentally changed perceptions of what might be achieved with sequencing. Roche acquired the technology and have increased the read-length and capacity with the ‘GX FLX+’ and also released a low-throughput personal genome sequencer, the ‘GS Junior’. Libraries are prepared by DNA fragmentation and each fragment is amplified and bound to a bead which is coated with a pair of oligos. The beads are centrifuged onto ‘picotitre plates’. These plates can contain more than 1 million wells and each is so small (44 microns diameter) that only one bead settles in each well.
A reaction called ‘pyrosequencing’ then takes place.
‘Pyrosequencing’ is performed by cyclical addition of an individual nucleotide, ATP sulphurylase and luciferase. The pyrophosphate is converted to ATP by the ATP sulphurylase which acts as the substrate for luciferase to generate a light signal. This signal is proportional to the number of pyrophosphate molecules released and the number of nucleotides incorporated. Each light signal is processed and converted to a DNA sequence by software on the instrument.
In 2005, the early Roche/454 sequencers generated reads of 100 bp and generated around 500,000 sequences (or 25 Mb) of data from a run.
In 2011, read length increased to 700 bp (using ‘Titanium Chemistry’ and the current GS-FLX system) and generates 1 M sequences (or 3 Gb) of data from a run.
(2) Illumina technology was originally developed by a company called Solexa. Illumina made significant improvements to the basic technology and released the ‘HiSeq 2000’ in 2010. This has been the most widely adopted NGS instrument across the world- mainly due to the simplicity of sample preparation and ease of method development.
DNA libraries are sequenced on a flowcell which has a lawn of two oligos complementary to the different adapter sequences. Cyclical reactions produce a ‘colony’ or ‘cluster’ of around 1000 copies of the original library molecule. Clusters are made single-stranded by cleaving of the adapter sequence. Hybridization of a sequencing primer is then followed by addition of fluorescent terminators in a cyclical reaction. Nucleotides are incorporated by DNA polymerase into the growing DNA strand. The flowcell is imaged to determine which nucleotide has been incorporated into each individual cluster. The terminator is removed by chemical cleavage ready for the next round of incorporation, imaging and cleavage.
In 2007, the early Solexa-based sequencers generated reads of 35 bp and generated around 30 M sequences (or 1 Gb) of data from a flowcell.
In 2011, read length increased to 100 bp (on the ‘HiSeq 2000’ system) and generates 2.4B sequences (or 300 Gb) of data from a run.
(3) Life Technologies acquired Ion Torrent in 2010 and now sell the ‘PGM’ and ‘Proton’ sequencers.
The original ‘SOLiD System’ (Support Oligonucleotide Ligation Detection) used emulsion PCR to generate template beads for sequencing-by-ligation. Beads were deposited onto a slide and a four-color sequencing chemistry was flowed over the surface. A sequencing-primer was hybridized to the adapter sequences (the 3’ end of the DNA fragment), followed by addition of 16 four-color 5’-blocked probes- one of which was ligated to the universal sequencing-primer and imaged. The probe was then chemically cleaved ready for the next round of probe hybridization and ligation. This cycle was repeated multiple times to generate the DNA sequence read of up to 75 bp.
The newer Ion Torrent system uses a very similar approach to the original 454 ‘pyroseqeuncing’. The sequencing is performed in the wells of a semiconductor chip into which individual emulsion PCR beads can be loaded. Sequencing is performed in the same cyclical manner but there are no additional enzymes and natural, rather than fluorescently modified, nucleotides are used. As each nucleotide is incorporated Hydrogen ions are released, which change the pH of the solution in the well. The change in pH is detected by the chip which has an ion sensor at the bottom of each well reading out data in “flowgram” format similar to the Roche/454 “pyrogram”.
Each of the above technologies will feature in their own expanded articles in the coming weeks here on the NGS Channel.