SAGE, or serial analysis of gene expression, is a technique that enables you to digitally analyze the entire gene expression profile of a cell(s). Before this technique, scientists were limited to studying a few gene’s expression at once by a technique called the expressed sequence tag approach. The coolest part of SAGE is you don’t even need to have sequenced the genes you want to analyze: this technique gives you both the identity of the genes expressed and the level of their expression, a process called transcriptome analysis. I went over the details of the basic SAGE technique before. In this article, I will dive in deeper and discuss new and improved SAGE techniques.
Quick Recap of SAGE Steps
There are two main principles here. Firstly, that a short nucleotide tag of 9-10 base pairs can be used to uniquely identify a transcript. Provided it is isolated from a unique position within the transcript. Secondly, linking several of these tags by concatemerization, you can study many gene’s expression simultaneously. However the first point has become less relevant nowadays as we’ll see.
SAGE begins with extracting mRNA and reverse transcribing it to create cDNA. The resulting cDNA is then processed in a series of steps. The end result of which is solution of concatemers. These concatemers are transformed into bacterial cells. As the cells replicate, more and more concatemers are made. Alternatively, PCR can be used. Either way, the material is extracted and used to create gene expression profile graphs.
Thus the SAGE technique enables you to visualize which genes are being expressed and can help forecast which diseases a person may develop, help in the discovery of new genes and help learn more about the expression profile of cells.
Since SAGE was first described over 20 years ago, several variations have come out. Here, we’ll look at three of those: LongSAGE, Robost-Long-SAGE and SuperSAGE. Each is an improvement on the last. The main difference if the throughput rate. In their 1995 paper in Science, Velculescu and colleagues concluded that their technique, SAGE, would take several months to determine transcripts expressed at greater than 100 mRNAs per cells (0.025%). At the time this speed was incredible but overtime this rate was just too slow prompting faster forms of the technique to be developed.
The original SAGE technique could take 5 ?g of mRNA to create a library of hundreds of cDNA tags. In comparison, LongSAGE published in 2002 in Nature Biotech, could use 20 ?g of mRNA to create a library of thousands of cDNA tags.
In LongSAGE, 19-21 base pair tags are used to create concatemers. Since the snippets of the genes are longer (instead of 9-10 base pair tags in the original SAGE method), the odds of them occurring once in one genome the size of the human genome were calculated by the group >99.8%. This increase in accuracy as well as the ability to study larger segments took SAGE to the next level.
Downsides of this technique was that multiple restriction enzymes were required. Not all genes had restriction sites for one enzyme so the technique needed to be repeated with several enzymes. There was also a technical issues with cloning and purification that made the technique erratic in its reliability. This signaled the need for further improvements.
Robust LongSAGE or RL-SAGE was the next iteration of SAGE. The paper describing it came out in 2004 in Plant Physiology. Here Gowda et. al described four major areas of improvement when compared to LongSAGE:
- It requires a smaller amount of mRNA to build a library: 50 ng.
- Their use of enhanced cDNA adapter and ditag formation using a longer ligation period (overnight).
- Only needing 20 ditag polymerase chain reactions were to obtain a complete library (up to 90% reduction compared with the original protocols).
- Concatemers only had to be partially digested with a restriction enzyme before cloning into a vector greatly – improving cloning efficiency.
These improvements meant you could generate two to three libraries, each containing over 4.5 million tags, within one month! But scientists wanted to cut this further. Also, like Long-SAGE and SAGE, RL-SAGE made use of sticky ends to form the ditags. This results in some bias as the association isn’t random.
SuperSAGE in PNAS in 2003 went another step in speeding up the process and increasing its reliability. Here, 26 bp tags were created. The increased length of the tags meant even higher precision (an increase of about 10,000 times the accuracy of LongSAGE) so that the probability of there being two duplicate genes was practically impossible. The key here was the new restriction enzyme: type III-endonuclease EcoP15I of phage P1. This created blunt-ends rather than sticky ends thus ensuring the random association of two tags to form ditags.
The improved accuracy and increased speed meant infected cells and other interacting organism situations could be studied together at the same time without fear of confounding results. Additionally, isoforms of sequences could be found.
This technique has since inspired another variation: high-throughput (HT) SuperSAGE where next generation sequencing (NGS) is employed to analyze up to millions of tags at once! With the introduction of bench-top NGS, HT SuperSAGE is helping unravel some of the big questions in science, such as how do viruses affect the transcriptome profile of their host cells and to find new genes in species across the kingdoms.
Have you tried these techniques? What does your lab use SAGE for?