Protein expression is an art. There are many routes to optimize a protein expression protocol, such as using different expression systems (e.g. E. coli, yeast cells, insect cells) or changing the expression vector or culture media for the expression host.
Fortunately, optimizing the parameters mentioned above often leads to improvements in your protein expression results. However, there is one other less obvious factor that can have a significant impact on your experiments. Read on to find out what this and how to approach it!
When the Universal Code Isn’t So Universal!
We’ve all had the “Universal” Genetic Code drummed into us as undergrads, learning what codons/triplets code for which amino acid.
But did you know that not all organisms follow this system in the same way. Welcome to the phenomenon of codon bias. Let’s recall that the genetic code contains 64 codons, 61 of which encode amino acids while the remaining 3 are stop codons that specify termination of protein translation. The genetic code is degenerate, with 61 codons for 20 amino acids.
Most researchers know that each amino acid can be encoded by more than one codon, but it is less well known that not all organisms use these codons evenly. In reality, organisms tend to have favorite codons that they use more often than others, and this ‘bias’ in codon usage often goes hand-in-hand with the availability of tRNA in the organism.
Unlike experimental bias (which is always a bad thing!), codon bias is not problematic for the organisms themselves, and in fact this phenomenon probably represents a balance between mutational biases and natural selection for optimization of protein translation.
Codon usage bias can however be a serious issue in heterologous protein expression. When expressing a foreign gene from a given organism in an expression host, the codon usage patterns of the source organism and the expression host may be rather different. If the foreign gene contains stretches of codons that are rarely used by the expression host, the expression host will likely run out of tRNA for these codons, thus reducing or preventing expression of that protein.
Examples of Codon Bias
There are a few examples of organisms that use particular codons differently i.e. for a different amino acid than other organisms. Additionally, and in extreme cases, codon usage can differ so much that a stop codon in one organism is actually seen as an amino acid-encoding codon in another, and this can lead to translational read-through or truncated protein expression in heterologous expression setups, fates that are even worse than reduced protein expression.
Some notable examples of codon usage bias:
- Mycoplasma capricolum (a goat pathogen) shows a deviation associated with the stop codon UGA, which codes for tryptophan1.
- Euplotes spp. (protozoa) has a stop codon reassigned as cysteine2.
- In the yeast Candida albicans (and several other Candida spp.), CUG codes for serine rather than leucine3.
If you are trying to the species listed above as host organisms for protein expression, codon usage bias is likely to have a significant effect on the outcome of your experiment!
Is Codon Bias Responsible for Your Inactive Protein?
As well as affecting protein expression levels, codon usage bias can also lead to the expression of inactive proteins. This can occur if the bias results in the loss of an amino acid that is critical to correct protein folding or regulation e.g. a phosphorylation site,
To check for this:
- If you’ve had problems expressing an active protein, retrieve the DNA sequence of the gene that encodes that protein (i.e. if expressing a protein from a rare bacterial species, retrieve the exact sequence from that organism’s genome sequence).
- Translate this sequence – you can use a program called BioEdit to translate the DNA sequence. You can download this program as freeware, and it works as well Windows systems. For a list of MAC-compatible programs, check out this article.
- Once you’ve done this, check the translated sequence against the predicted amino acid sequence from your rare bacteria’s database.
If there are no differences between the two translated sequences, you’re problems are probably not related to codon usage. If, however, there is a difference between the two, then you may have found the source of your inactive protein!
Codon Optimization Approaches
- Try site-directed mutagenesis (SDM). This is useful codon optimization tool if you only have a small number of problematic codons – change the problematic codon to another codon that you know will translate to the amino acid you require (based on your knowledge of the host’s codon usage patterns). For example, to express a C. albicans protein in another organism, change the CUG’s in the C. albicans gene to TCC, which also codes for serine. Otherwise, the CUGs in your target protein will be translated as leucines in your host organism. There are a lot of online SDR protocols, and many commercial mutagenesis kits available to help you on your way.
- Alternatively, you could take advantage of modern synthetic biology and purchase a synthetic version of the gene. Use your knowledge of codon usage in your host organism to optimize the sequence for best results.
Once you’ve got the “correct” sequence, ligate the gene into your expression vector of choice and get back on the protein expression horse!
It can be frustrating trying to express your favorite protein in high levels in its active form, but when you crack that code, there is no better feeling! Do share your experiences with us by writing in the comments section!
1. Moura GR et al. (2010) Development of the genetic code: insights from a fungal codon reassignment. FEBS Letters; 584(2), 334-341.
2. Santos MAS and Tuite MF (1995) The CUG codon is decoded in vivo as serine and not leucine in Candida albicans. Nucleic Acids Research; 23(9), 1481-1486.
3. Yamao F et al. (1985) UGA is read as tryptophan in Mycoplasma capricolum. PNAS; 82(8), 2306-2309.
Originally published in 2015. Updated and republished in 2017.Image credit: Science Museum London