A Short History of Sequencing Part 1: from the first proteins to the Human Genome

Written by: James Hadfield

last updated: April 2, 2020

It all started with proteins

The earliest methods for sequencing were developed for proteins. In 1950, Pehr Edman published a paper demonstrating a label-cleavage method for protein sequencing which was later termed “Edman degradation”. Around the same time Fred Sanger was developing his own labelling and separation method which led to the sequencing of insulin. For this work, Sanger was awarded the 1958 Nobel Prize for Chemistry. Leap forward ten years and Fred Sanger was sequencing once again, this time it was RNA- demonstrating the first version of electrophoretic sequencing as we know it today (Brownlee et al., 1968).

Plus and minus in the 1970’s

Fast-forward once again to the 1970’s and we find Fred Sanger still at the forefront of nucleic acid sequencing. In 1975 whilst at the Laboratory of Molecular Biology in Cambridge, Fred Sanger developed the “plus and minus” method for DNA sequencing (Sanger and Coulson, 1975). Again there was competition in the field with Maxam and Gilbert working on degradation sequencing (Maxam and Glibert, 1977) however, their method was ultimately to falter due to the ease and quality of the Sanger method.

Got any spare ddTTP?!

After refining the plus and minus technique, the seminal 1977 PNAS article was published (Sanger et al., 1977). The research was kick started by a conversation with Klaus Geider from the Max-Planck Institute who had some ddTTP he was willing to share! The use of the ddTTP (which can cause termination of an elongating DNA molecule) was so startlingly good that Fred Sanger and Alan Coulson had to produce their own supply of the other three ddNTP’s. The results back then were essentially the same as the Sanger sequencing we are familiar with. The Sanger sequencing we use today would not have been possible without many other developments and improvements- shotgun cloning, PCR, ‘Phred’(used to identify a sequence from fluorescence data), simple DNA extraction methods, and so on.

Happy birthday to you

For his work on DNA sequencing Fred Sanger was awarded the G.W. Wheland Award (1978); the Gairdner Foundation Annual Award (1971), the Louisa Gross HorwitzPrize (1979), the Albert Lasker Basic Medical Research Award (1979); the Biochemical Analysis Prize of the German Society for Clinical Chemistry and the Nobel Prize in chemistry (1980). Fred Sanger was 94 last month- in 2018, let’s hope we can celebrate his 100th and DNA’s 65th birthdays!

The Human Genome Project (HGP)

In the years following the 1977 Sanger paper, DNA sequencing with terminating nucleotides was improved. Back then, DNA sequencing was performed in four separate tubes, the products were radioactively labelled and each gel had four tracks from which the DNA sequence was literally read-off to some kind (and no doubt, bored) person in your lab. Fortunately, Sanger sequencing benefitted enormously from commercial development by Applied BioSystems and others. Four-color dNTPs (Smith et al., 1985) meant single-tube reactions could be performed thereby simplifying the sample prep. Automated DNA sequencers (Smith et al., 1986) such as the ‘373’ ran slab-gels with 32 samples per run. A series of instruments followed and ABI moved to 96 samples on the ‘377’,capillary sequencing on the ‘3700’ (though not a great instrument) and finally, high-throughput 96-lane capillary sequencing on the ‘3730XL’ where sequences are produced automatically with quality scores. Today this is the workhorse of DNA sequencing providers.

Facing a gargantuan task

In the mid 1980’s, scientists began to talk about the possibility of sequencing the complete Human genome. It seemed a gargantuan task- perhaps even an impossible one. The HGP formally began in 1990 and was completed in 2003. In 1998, Craig Venter launched Celera Genomics and the race was on to complete the Human genome. During the whole project, costs per base sequenced dropped over 100-fold. The estimated cost of the HGP was $300M-$3B, while the actual bulk of the sequencing cost around $300M, there was a huge investment in genome sciences, which underpinned the development of sequencing methods. Then in 1996, Mustafa Ronaghi published his paper on pyrosequencing (Ronaghi et al., 1996), a new method very different from previous ones. In Part 2, we look at the first of the next- in a manner of speaking!

References

Brownlee, Sanger and Barrell (1968) The sequence of 5 s ribosomal ribonucleic acid. J Mol Biol. 34:379-412. https://www.ncbi.nlm.nih.gov/pubmed/4938553 Sanger and Coulson (1975) A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J Mol Biol. 94:441-446. https://www.sciencedirect.com/science/article/pii/0022283675902132 Maxam and Gilbert (1977) A new method for sequencing DNA. PNAS 74:560-564. https://www.pnas.org/content/74/2/560.abstract Sanger, Nicklen and Coulson (1977) DNA sequencing with chain-terminating inhibitors. PNAS 74:5463-5467. https://www.ncbi.nlm.nih.gov/pubmed/271968 Smith et al., (1985) The synthesis of oligonucleotides containing an aliphatic amino group at the 5? terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis. NAR 13:2399-2412. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC341163/ Smith et al., (1986) Fluorescence detection in automated DNA sequence analysis. Nature 321:674-679. https://www.ncbi.nlm.nih.gov/pubmed/3713851 Ronaghiet al., (1996) Real-time DNA sequencing using detection of pyrophosphate release. Anal. Bio. 242:84-89. https://www.sciencedirect.com/science/article/pii/S0003269796904327

James has a PhD in Genomics from the University of East Anglia.

More 'Genomics and Epigenetics' articles