For those of us who work with
Mus musculus or
Homo sapiens, to name a couple of species, a few clicks on
UCSC Genome Bioinformatics Site or
Ensembl gets you the full and precise DNA sequence for any annotated gene in the genome. This luxury is not in place for all species however; many of which remain unsequenced.So how do we uncover the genetic code of our favourite model organism at our gene or region of interest?
Degenerating the sequence
Most of the time, help will come from other closely related species, for which the genetic sequence, or protein code of the gene we are interested in, is known. Homologous genes or amino acid sequences can be aligned from multiple species and conserved and non-conserved regions between species can be identified.
If we know the amino acid code of related organisms, we can work backwards to design a number of ‘degenerate’ primers of multiple iterations to find a set that are complementary to our ‘unknown’ sequence and give an amplifiable product.
Definition of degenerate primers
A degenerate primer is defined as:
“A mix of oligonucleotide sequences in which some positions contain a number of possible bases, giving a population of primers with similar sequences that cover all possible nucleotide combinations for a given protein sequence” (Iserte 2013)
. For example:
ATCGTT[GC]AAGT[AGC]ATC
refers to a series of primers in which the seventh and twelfth nucleotides are degenerate. The amount of degeneracy is defined by the number of different primer combinations in the mix. The above example has a degeneracy value of
six.
Using the International Union of Pure and Applied Chemistry (IUPAC) system for degenerate bases (Table 1) we can insert a single-letter code to represent our unknown region. In the above example we would use ‘S’ in place of GC, and ‘V’ in place of AGC.
Designing degenerate primers
To design your own set of degenerate primers, follow some basic guidelines:
1) Align multiple amino acid sequences using free online software such as
EBIClustalO.
2) Target an area approximately 200-500 base pairs in length for optimal PCR amplification.
3) Position forward and reverse primers in more conserved regions – the less degenerate, the further apart these can be.
4) Include between 6 and 7 amino acids in the primers, equating to ~15-20 base pairs.
5) Try to include amino acids methionine and tryptophan, which are coded by a single codon (three-letter nucleotide code), and avoid amino acids leucine, serine and arginine, which can each be coded by six codon combinations (Table 2).
6) For subsequent cloning procedures and to increase primer length (and therefore annealing temperature) add a 5’ tail (6-9 base pairs) containing a restriction enzyme site.
7) If there is complete degeneracy (no matches among any given species), consider using the base inosine (structurally similar to guanine) as it can pair with any of the four bases, although it will bind to cytosine preferentially. Alternatively, insert N for aNy base to ensure equimolar concentrations of each base at that position in your primer mix.
8) Avoid degeneracy at the 3’ terminus (this would not be a good place to insert inosine).
Depending on the goal, the play-off between primer specificity and efficiency can be modified by altering the degeneracy of the primer. For example, the more degenerate the primers, the less specific annealing will be; however, decreased degeneracy will allow more potential to identify unknown variants.
Help is out there
Thankfully, like basic primer design, there is help out there for designing degenerate primers. There are a number of online and downloadable programs available to aid design. Based on your input sequence, the software will generate the minimum number of degenerate primers while maintaining optimal PCR requirements. Some commonly used programs to try out include iCODEHOP (COnsensus-DEgenerate Hybrid Oligonucleotide Primers),
NCBI Primer-BLAST or
HYDEN (HighlY DEgeNerate primers)
.
Other uses for degenerate primers
There are quite a few applications for degenerate primers, aside from sequence determination. These include:
- Distinguishing different alleles (parental copies) using a simple base mismatch at 3’ terminus
– determining wild type from mutant alleles
– determining parent-of-origin alleles (genotyping)
– an unavoidable single nucleotide polymorphism (SNP). For example, an A/G SNP could be replaced with an R
– a CpG site (CG dinucleotide) present in bisulfite converted DNA may or may not be methylated, therefore the C residue could be a C or a T. Replace with a Y.
What do you use your degenerate primers for?
References
Iserte, J.A., Stephan, B.I., Goni, S.E., Borio, C.S., Ghiringhelli, P.D., Lozano, M.E. (2013) Family-specific degenerate primer design: a tool to design consensus degenerated oligonucleotides. Biotechnol Res Int 2013:38364