Quantcast

Sanger Sequencing: How the Genome Was Won

I’ve never run a sequencing gel in my life, but people around me did, and they spent a lot of time on getting it just right. Although the principle described by Sanger in 1975 sounds straightforward (1), sequencing gels are very long and very thin – less than a millimeter thick! They were easy to break at any stage of manipulation. The run speed of the gel should be just right for optimum separation of nucleotide sequences on the gel. Therefore, the first ten nucleotides were always hard to read because they run close to each other.

Fortunately, it’s a technique that can be (and was!) automated. Now instead of reading sequencing gels we read electropherograms (Figure 1). So know that every time you send your PCR products or plasmids for sequencing and receive a nice wavy diagram of your sequence; you are looking at the results of a modified Sanger sequencing method.

Classical Sanger Sequencing (CSS)

The classical method starts as so many PCR reactions do, with a single-stranded DNA template, a primer, DNA polymerase – and, of course, nucleotides. However for Sanger sequencing you do not use just regular nucleotides, you also need modified, radioactive nucleotide called ddNTP. Unlike the usual dNTPs used in PCR reactions, ddNTPs don’t have 3′-OH group to form the phosphodiester bond required for chain elognation.

The infamous sequencing gel usually has four lanes. One lane for each ddNTP reaction – a lane for ddATP, ddCTP, ddGTP, and ddTTP. Electrophoresis of the resulting DNA products on the gel results in a ladder of radioactive fragments – each fragment representing the result of a ddNTP terminating the chain at a certain nucleotide.

After the gel is dried, it is exposed to a X-ray film and developed. The sequences are read from the shortest (close to the end of the gel, as they run faster) to the longest, skipping between A, T, G and C lanes to determine your DNA sequence.

A good CSS run will allow you to read about 300 nucleotides. But this assumes your gel runs perfectly and you do not have any other problems. Other problems include the single-stranded DNA looping, which causes the DNA to jump over. And with this method stretches of the same nucleotides and sequence repeats are difficult to interpret on the gel.

But despite all these troubles this is how the first DNA sequences were determined. “Back when boats were made of wood and men of steel.” Eh?

Terminal Dye Sanger Sequencing

In the last 20 years CSS using radioactive nucleotides has been mostly replaced by fluorescent nucleotides. However the principle of DNA extension and termination is the same. The main components – labelled nucleotides and the gel – have just been replaced by the next technological step.

Now instead of using radioactive ddNTP, ddNTPs have a fluorescent dye molecule. Once, more different dyes are used for the different dNTPs. This allows conducting the reaction in one tube instead of having to do four separate reactions (3).

Terminal Dye Sequencing reactions are run in capillary electrophoresis apparatus (no gels, hooray!). And an automatic system is used to read the fluorescent signal. The computer then converts all the signals into an electropherogram that you are used to seeing (See Figure 1). These automated systems are more robust than CSS in many ways. For example, this method allows you to get up to 1000 nucleotides per run.

But you don’t want to over-rely on the automatic reading of the sequence. You should still always manually look at your trace. Here are some common sequencing errors you will run across if you do it enough.

  • Skipped Nucleotides. This happens when two identical nucleotides next to each other are accidently read as only one nucleotide. Therefore it is important to visually scan ALL of your electropherograms for missed nucleotides.
  • Dye Blobs. Sometimes it is just the opposite. You can have several peaks overlapping at one nucleotide space. In our lab we used to call these “dye blobs” – dunno if they have a real name. (Leave a comment if you know!) When this happens the software will just a read “error” or “N” where the overlap occurs. Sometimes you can visually parse it out what the sequence should be, but more often than not you will need to re-sequence this area.
  • Polymorphisms. If you re-sequence a region of overlapping peaks and still get overlapping peaks, read as “N”. Then you my friend are dealing with a polymorphism. Check the literature to see if it has been reported before. And rejoice you may have discovered something new.

DNA_sequence

Figure 1. Electropherogram for DNA sequence analysis. Author: Sjef.

Other Clever Sanger Sequencing Applications

CSS has acquired other uses too. DNA foot-printing, where DNA template has a DNA-binding protein attached to it, was invented based on the Sanger sequencing principle. In this method the sequencing reaction will be terminated not at the end of the template, but where the protein sits on the DNA, allowing scientists to pin-point the binding site (4). A spin off of that method is RNA toe-printing, , where RNA plays the role of template and ribosomes – of the bound protein (5).

Do you have a horror or funny story associated with Sanger sequencing? Please, share it with us.

Literature:

  1. Sanger F, Coulson AR. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase”. Mol. Biol. 1975 94 (3): 441–8.
  2. Fleischmann R, Adams M, White O, Clayton R, Kirkness E, Kerlavage A, Bult C, Tomb J, Dougherty B, Merrick J., et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 1995 269 (5223): 496–512.
  3. Smith LM, Sanders JZ, Kaiser RJ, et al. Fluorescence detection in automated DNA sequence analysis”. Nature 1986 321 (6071): 674–9.
  4. Tullius TD. Physical studies of protein-DNA complexes by footprinting. Rev. Biophys. Biophys. 1989. 18:213-37.
  5. Pisarev A, Hellen CUT, Pestova TV. Recycling of Eukaryotic Posttermination Ribosomal Complexes. Cell 2007 131(2) 286 – 299.

1 Comment

  1. Gary Chew on May 11, 2016 at 4:01 am

    The commercial laboratories I work with provide sanger sequencing service (though I am not in the sequencing team). For the single peak you mention as dye blob, we don’t call that dye blob. For us, dye blob is another case whereby the post sequencing clean up is not done properly and you have a large peak over the nucleotide signals which always appear around 80bp and 125bp. As for those multiple-peak sequence we call it peak under peak. May be cause by multiple templates or high background.

    You can get more info (Form of error, caused and solution) from our support site. I am sorry if i sound like advertisement :P. I solely seek to share info from our support page so to save many ppl’s a lot of struggle like what I go through during my post grad school time.

    http://www.base-asia.com/dna-sequencing-services/support/technical-support

Leave a Comment





Share5
Tweet
Share
+1