A revolution in 2005
The start of the NGS revolution was clearly marked in 2005 by the publication of the complete genome sequences of two bacterium (Mycoplasma genitalium and Streptococcus pneumonia) by 454 Life Sciences Corporation in one run of their Genome Sequencer with a 96% coverage at 99.96 % accuracy (Margulies et al. 2005). The 454 team followed up with a number of high profile publications in collaboration with leading genomics scientists, including the complete genome sequence of James Watson (Wheeler et al. 2008) and one million bases of genomic DNA sequence from fossilized Neanderthals (Green et al. 2006).
The initial Genome Sequencer product (‘GS20’) was offered commercially in 2005. It produced approximately 25 million bases of high quality DNA sequence per run, with reads 80-120 bases long. In 2012, an upgrade to the ‘GS’ system known as ‘FLX+’ increased the average read length to 700 bases for ~1 million reads, for a total sequence yield of about 0.7 Gb per run. The cost per run remains about $8,000. Most investigators use a multiplex strategy that involves both barcodes for individual samples, and a set of gaskets which divide the surface of the sequencing plate into sub-sections.
Start with a shotgun approach
Sample preparation for the 454 system follows the shotgun strategy, i.e. – random shearing of the genomic DNA, adding adapter sequences to the ends, then combining the DNA fragments with Sepharose beads (diameter ~28 µm) which have been coated with oligonucleotides complementary to the adapters. The DNA is mixed with an excess of beads so that most beads bind only a single template molecule. The beads with the bound DNA are subjected to emulsion PCR, which amplifies the DNA templates from a single copy to approximately 10 million copies on each bead. Subsequently, the enriched, template-carrying beads are deposited into open wells arranged along one face of a 60×60 mm2 fibre-optic slide (or picotiter plate). The wells are sized to fit only a single bead and each plate contains approximately two million wells. Reagents are supplied to the picotiter plate for sequential rounds of sequencing by synthesis using a modification of the pyrosequencing method (Ronaghi et al. 1996).
How does pyrosequencing work?
Pyrosequencing uses DNA polymerase to synthesize complementary strands to a single-stranded template, but it provides only one type of deoxynucleotide triphosphate base in a single cycle of the reaction. Each addition of a new nucleotide to a growing copy strand is accompanied by the release of pyrophosphate, which is converted to the emission of light by a reaction including ATP sylfurylase, luciferase and luciferin. The chemiluminescent event is detected by a camera. The location of each template molecule in its unique well of the 454 picotiter plate allows for the base to be recorded and computational assembly of the sequences of all templates progresses simultaneously. The 454 sequencer is equipped with an integrated computer that contains a six million gate FPGA co-processor which allows for signal processing in real time. An FPGA is a ‘Field Programmable Gate Array’ which is basically a chip which can be programmed to do almost any digital function.
A low rate of errors, but some drawbacks…
Since only one type of nucleotide base is added during a cycle of DNA synthesis, the pyrosequencing chemistry has a very low rate of base call errors. However, the use of pyrosequencing chemistry creates one of the key drawbacks of the 454 sequencing method. When a template molecule contains multiple bases of the same type, such as a run of AAAA’s (a ‘homopolymer’), then multiple bases are synthesized onto the copy strand all at once, creating a larger emission of light. It is difficult for the system to accurately count the number of bases in homopolymers longer than eight or nine bases. Since many different template molecules are sequenced simultaneously in different wells of the picotiter plate, and homopolymers of various lengths occur randomly, the length of the newly synthesized DNA copy strands will differ. DNA templates with many homopolymers will produce longer copy strands than those with sequences that contain only single bases. As a result, the 454 sequencing process produces a set of sequence reads with a distribution of different sizes.
The 454 system is now most frequently used for amplicon studies where a defined region of DNA is sequenced in many samples, or very deeply in one sample. For example, in metagenomic studies, a single portion of the ribosomal RNA gene (rDNA) can be sequenced from all bacteria using a set of universal primers. A long read of at least 300 bp is required in order to obtain sufficient taxonomic information to identify each type of bacteria present in a mixed sample (which might be from an environmental or medical source). Another example would be a blood sample from an individual patient with AIDS- the HIV genome can be sequenced very deeply to search for rare variants which could lead to the development of drug resistance. Long 454 reads are also sometimes used in combination with shorter reads from other NGS systems for the de novo assembly of complete genomes.
Overview of the Roche 454 sequencing system. DNA is sheared into small fragments to which adapters are ligated, fragments are attached to beads, mixed into an emulsion, amplified by emulsion PCR, deposited in wells of a picotitre plate, then sequences are determined by pyrosequenicng.
Margulies et al (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376-380.
Wheeler et al (2008) The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872-876.
Green et al. (2006) Analysis of one million base pairs of Neanderthal DNA. Nature 444, 330-336.