Genetic Notation: Crack the Code!

Pop Quiz Time: You get a new bacterial strain from a culture collection, but you’re not quite sure what the genetic notation (i.e., all the letters and symbols) means. Do you:

A. Cry?

B. Ask around to see what your lab mates think?

C. Cross your fingers that your friends at Bitesize Bio can help you out?

Well, I hope you chose answer “C”, because that’s exactly what this article is all about!

Thankfully, there is a standard nomenclature for bacterial genes dating back to 1966.1 However, there are a lot (and I do mean A LOT) of rules for naming (and thus reading) genetic and phenotypic mutations. This makes sense given all the possible gene and loci alterations scientists can introduce! For the sake of simplicity, this article will focus on the most common types of genetic notation that we meet as biologists. You will also get some useful tips that can help you to decipher what it is you are reading. So let’s roll up our sleeves and dig in!

Tip #1: Let the Basics Be Your Guide

If you come across a strange 3-letter abbreviation in your strain name, have no fear! This notation simply exists to designate a gene of interest (i.e., one that has been mutated or inserted during the generation of your strain). Each gene is assigned a lower case 3-letter designation that is usually an abbreviation for the pathway affected or phenotype resulting from the mutation/insertion. If you’re ever confused on what the abbreviation means, check out the HUGO Gene Nomenclature Committee website. To start you off, we have listed some common examples in Table 1 below:2

Table 1. Common gene abbreviations

Biosynthetic genes
ala alanine
arg arginine
asn asparagine
gua guanine
pur purines
pyr pyrimidine
thy thymine
bio biotin
nad NAD
pan panthothenic acid
Catabolic genes
ara arabinaose
gal galactose
lac lactose
mal maltose
man mannose
mel melibiose
rha rhamnose
xyl xylose

 

  • If there are different genes that affect the same pathway, they are delineated by a capital letter following the 3-letter designation. For example, mutations affecting pyrimidine biosynthesis are designated pyr; the pyrC gene encodes the enzyme dihydroorotase and the pyrD gene encodes the enzyme dihydroorotate dehydrogenase.
  • If several mutations are introduced into a pathway, each is consecutively assigned a unique allele number. For example, pyrC19 refers to a particular pyr mutation that affects the pyrC In order to distinguish each mutation, no other pyr mutation, regardless of the gene affected, will be assigned the allele number 19. A separate series of allele numbers is used for each three-letter locus designation.

Tip #2: Amino Acid Mutations Are a Thing

As shown in Table 1, amino acids are often targets of genetic mutations. Given that they are the building blocks of proteins, amino acid mutations make complete sense when you’re trying to alter a specific phenotype. Now, remember when your biochemistry professor had you memorize the single letter abbreviations for all the amino acids? This is when you get to use that information! Let’s say that there is a genetic point mutation resulting in alanine (A) at position 235 where threonine (T) used to be. This would simply be noted as “T235A”. Easy peezy.

Tip #3: Go with the Most Obvious Answer

Every so often biologists come across naming schematics that actually make sense! Specifically, this can occur when the actual protein that a gene encodes is known, and can thus become part of the genetic name. For example:

  • rpoA encodes the ?-subunit of RNA polymerase
  • rpoB encodes the ?-subunit of RNA polymerase
  • polA encodes DNA polymerase I
  • polC encodes DNA polymerase III
  • rpsL encodes ribosomal protein, small S12

Seems pretty straightforward, doesn’t it? Great! Now, what about if your mutation is actually the result of an insertion? Well, reading that naming scheme can get a little tricky depending on the exact location of the insertion. Read on!

Tip #4: Break out the Rosetta Stone

In addition to understanding 3-letter abbreviations, one of the most confusing aspects of genetic notation are the symbols. Some make perfect sense, like “+” for wild type and “-“ for mutation; others not so much. For now, Table 2 can serve as a cheat sheet for the most commonly used genetic notation symbols.

Table 2. Common symbols used in genetic notation

+ wildtype
= Identical to reference sequence (no change, wild type sequence)
? Unknown
/ Mosaic cases; separator between the difference nucleotides, transcripts, and proteins generated from one allele
// Chimeric cases; separator between different nucleotides, transcripts, and proteins generated from a mix of four alleles
( ) Indicates uncertainty in the description of a change
0 (zero) Indicates no product/nothing
Mutated gene
* Translation termination (stop) codon
_ Nucleotide numbering, used to indicate a range
? Deletion
Fusion
: Fusion
:: Insertion
? Genetic construct introduced by a two-point cross-over
> Substitution (for bases)
Range
; Separator between different changes in one allele or between two alleles
, Separator between different transcripts or proteins generated from one allele
am Amber mutation
con Conversion
cs Cold sensitive
del Deletion
dup Duplication
ext Extension
fsX Frame shift
ins Insertion
inv Inversion
o Opposite strand
oc Ochre mutation
R Resistant
sup Suppressor
t Translocation
ts Temperature sensitive
um Umber (opal) mutation
X Stop codon

Tip #5: Chromosomes Rearrangements Are Goofy

Chromosome rearrangements refer to deletions, duplications, and inversions of genes (Table 2). You will recognize these guys as a 3-letter notation, indicating which type of rearrangement you’re dealing with, followed by the corresponding genes in parentheses, and then the allele number:

  • Deletions = DEL(genes)allele number
  • Inversions = INV(join point gene #1 – join point gene #2)allele number
  • Duplications = DUP(gene #1*join point*gene #2)allele number

Tip #6: Biologists Use a Lot of Antibiotics

If you are already familiar with the designation of antibiotic resistance or sensitivity, great! If not, Table 3 below will help you out with abbreviations for the most commonly used antibiotics for developing sensitive or resistant strains.

Table 3. Common antibiotic resistance designations and related terms

Abbreviation Antibiotic
amp Ampicillin
azi Azide
bla Beta-lactam
cam/cat Chloramphenicol
gen Gentamicin
kan Kanamycin
neo Neomycin
rif Rifampicin
spc Spectinomycin
str Streptomycin
tet Tetracycline
topA Phage T1
zeo Zeomycin
XG X-gal
XP X-phosphate
R Resistance
S Sensitivity

Whew! That was A LOT of information! And, as mentioned in the beginning, we have only covered the most common, tip of the iceberg mutations, symbols, and nomenclature that you may come across with the genetic notation of any new strain. If you don’t see your strain here, and you need more help, check out the list of references below. Go forth and decipher!

References and Further Reading

  1. Demerec M, Adelberg EA, Claark AJ, Hartman PE. A proposal for a uniform nomenclature in bacterial genetics. Genetics, 1966; 54(1):61-76.
  2. Birge EA. Bacterial and Bacteriophage Genetics. 2005. 5th Ed. Springer-Verlag New York. DOI: 10.1007/0-387-31489-X.
  3. International Committee on Standardized Genetic Nomenclature for Mice. Guidelines for Nomenclature of Genes, Genetic Markers, Alleles, and Mutations in Mouse and Rat.
  4. Rice University. Genetic nomenclature for Drosophila melanogaster.
  5. American Society for Microbiology. Journal of Bacteriology.
  6. The Arabidopsis Information Resource (TAIR). Arabidosis Nomenclature.
  7. den Dunnen JT, Dalgleish R, Maglott DR, Hart RK, Greenblatt MS, McGowan-Jordan J, et al. HGVS Recommendations for the Description of Sequence Variants: 2016 Update. Hum Mutat. 2016; 37(6):564-9.

Leave a Comment