Codon Optimization 101

Written by: Alex Chen

last updated: October 28, 2021

The intriguing thing about protein expression is that the combination of transfer RNAs (tRNAs) that translate the 3 letter codon into an amino acid (aa) far exceeds the number of existing amino acids (aa). If you do the math correctly, the maximum number of unique combinations using the triplet code to code for the 4 bases is 64, however the variety of aa that we use on earth is only 22 (don’t forget about those rare ones!). Therefore, the code is redundant: some amino acids are represented by multiple tRNAs using different triplicate codes. For example, ATT, ATC, ATA all code for isoleucine. The catch for recombinant protein expression? Different organisms favor particular species (abundance) of tRNAs. So let’s say you try to express a human protein in E.coli. The E.coli translational machinery codes for all aa, but it might have low abundance for some of the tRNAs utilized by the codons of the human protein. So what do you do? If you want high protein expression, you need to switch (“optimize”) the codon sequence to suit the host protein expression machinery. Today, I will touch upon some important points on how to optimize your protein expression.

Codon Adaption Index (CAI)

Before you order any primers for your PCR experiment, find out how similar (or different) the codon usage is between the genes you are interested in expressing and the host. CAI is the most widely used technique for analyzing codon usage bias. It calculates an index that gives you an idea of how well your “foreign” sequence will adapt to the host protein expression machinery. To use CAI, compare the codon usage in your gene of interest to the most frequent codon usage in a set of highly expressed genes in the model expression organism. The relative adaptiveness of codons ranges from 0 to 1, depending on how close your gene of interest is to the reference gene sequences, with 1 being the closest. But be careful about the index because there are also some sequences that can really screw up the analyses. One thing you need to keep in mind is that CAI calculations consider only the instances in which various codons are competing for the same tRNA. Therefore, CAI is also referred to as codon adaption index for synonymous codon usage.

Tools for Analyzing Codon Usage

Ha, but don’t worry! There are plenty of useful tools that do the work for you. For example, Bioinsilico has codon usage analysis software for windows user called Acua. It incorporates a visual interface and offers a number of parameters for analysis. It offers nucleotide analysis, statistical and analysis for codon usage. It’s definitely worth checking out. Another great site is the E-CAI server. This is a web-server that calculates the CAI by comparing randomly generated sequences with similar GC content with your sequence of interest. In that website, you can tweak around the codon usage table, and choose different genetic codes. You should definitely give it a try!

Codon Usage Tables

Another very useful website is the codon usage database. If you want to find out the codon usage for different organisms, this is a great place to look. Just type in the scientific name of the organism and it will give you a table that shows the percentage usage of different codons.

How & What to Do with CAI ?

Here is a simple rundown on how to analyze your candidate gene for protein expression in a particular organism: Step 1. Go to the E-CAI server and hit enter on CAI calculation panel Step 2. Copy and paste your cDNA sequence in FASTA format Step 3. Copy and paste the codon usage table on the organism you want to express the cDNA in from the codon usage database. You can also input an additional codon usage table from another organism for comparison purposes Step 4. Choose the genetic code section to either eubacterial or standard Step 5. Hit Submit What you will see is a table displaying the CAI of your gene based on the codon usage of the model organism you selected to express the cDNA. Again the score goes from 0 to 1. The general rule of thumb is that the higher the score (1), the higher chance that the protein will express well in the model organism. In addition, you can go back to the main page and try to calculate the expected CAI function. This calculation gives you statistical confidence that what you observe is significant based on calculating against randomly generated sequences.

Things to Note

This is just a short run on how to determine if your target cDNA will express in a model organism. First, take a look at codon biases to understand the magnitude of the project that you are undertaking. As a note of caution, a high CAI does not guarantee 100% success since there are many factors involved in recombinant protein expression. Therefore, if you have a relatively high CAI score and a small protein, you might be able to tinker around with just a few codons (mutagenesis) to make it right. However, if you are dealing with a low CAI score and a complicated protein, it is probably easier to synthesize and optimize the cDNA from scratch. This could definitely save you a lot of money and headaches down the road. And if you do the homework, it will help you with the direction and choice of expression platform. Good luck to the future biologists!

Alex has a PhD in Virology and Immunology from the University of Massachusetts Medical School.

More 'Genomics and Epigenetics' articles