Technical Skills
Soft Skills
Events
Podcasts
Resources
Get Involved

Join Us
Sign up for our feature-packed newsletter today to ensure you get the latest expert help and advice to level up your lab work.

Sign Up now

Vital for Soup, Vital for Labs: Serial Analysis of Gene Expression (SAGE)

Written by: Olwen Reina

last updated: June 29, 2026

Serial Analysis of Gene Expression (SAGE) allows you to digitally analyze gene expression patterns. Not just of a few genes but for a cell’s complete gene expression profile. Before this technique, scientists were limited to studying a few gene’s expression at once by a technique called the expressed sequence tag approach.

SAGE starts with mRNA and end with neat graphs that allow you to compare the gene expression of normal, developmental and diseased cells. I bet you can imagine plenty of times this would be useful.

So let’s take a look how this technique works.


What is SAGE?

SAGE, or serial analysis of gene expression, is a technique that enables you to digitally analyze the entire gene expression profile of a cell(s). It was first described and published by Velculescu et al. in 1995.

Choose a free resource to help you move forward

download

The Molecular Cloning Cheat Sheet

Wondering how much insert you need, which strain you should use, if your DNA is pure enough, or if your vector needs electrocompetent cells? This printable reference card puts all those answers in one place. Set up correctly the first time, every time.
DOWNLOAD FREE

CHEAT SHEET

Nuclear Extraction Protocol

Do you want to improve your sample yields and save time? Look no further! Our free Nuclear Extraction Protocol Cheat Sheet includes everything you need to know to ace nuclear extraction in the lab, including a step-by-step protocol, nuclear and cytoplasmic extraction buffer recipes, and expert tips to boost your sample yields.
GET YOUR COPY

At the time, techniques like RNA blotting and expressed sequence tagging were used to study gene expression. However techniques like these were slow and very limited. The speed of SAGE and the ability to study many genes, as small as 10-14bp was a huge step forward in genetics.

The coolest part of SAGE is you don’t even need to have sequenced the genes you want to analyze: this technique gives you both the identity of the genes expressed and the level of their expression, a process called transcriptome analysis.

How does SAGE work?

There are two main principles here. Firstly, a short nucleotide tag of 9-10 base pairs can be used to uniquely identify a transcript. Provided it is isolated from a unique position within the transcript. Secondly, linking several of these tags by concatemerization, you can study the expression of many genes simultaneously. However, the first point has become less relevant nowadays, as we’ll see later on.

SAGE begins by extracting mRNA and reverse-transcribing it to create cDNA. The resulting cDNA is then processed in a series of steps. The end result of which is the solution of concatemers. These concatemers are transformed into bacterial cells. As the cells replicate, more and more concatemers are made. Alternatively, PCR can be used. Either way, the material is extracted and used to create gene expression profile graphs.

Thus the SAGE technique enables you to visualize which genes are being expressed and can help forecast which diseases a person may develop, help in the discovery of new genes and help learn more about the expression profile of cells.

Let’s take a look at the steps:

First, you need to prep your cDNA and purify your cDNA:

Step 1: Isolate your mRNA and perform reverse transcription using reverse transcriptase and biotinylated primers to generate the corresponding cDNA. Using biotinylation will allow you to isolate your cDNA fragments later on in the process.

Step 2: Mix your cDNA with streptavidin beads. These beads will bind to the biotin-cDNA complex. (You might recognize streptavidin-biotin interaction from western blotting and immunohistological staining techniques – streptavidin-biotin is a very strong bond useful in lots of techniques.)

Step 3: Next cleave the cDNA using a restriction endonuclease enzyme, called an anchoring enzyme. If you remember your Biochemistry 101: restriction enzymes cut at specific points, called a restriction site. So the enzyme you choose will depend on where you would like it to cut. Chef’s choice. And since each cDNA fragment is different, each one will be cut at a different place. Where depends on where the restriction enzymes corresponding site is located on the individual fragments. The result of this cleavage is that the beads are bound to cDNA fragments of various lengths with the same sequence at their exposed end.

Step 4: Cleaved cDNA that is no longer bound to the beads is now removed by rinsing. And the remaining bound cDNA is divided into two solutions.

 

Image 1: SAGE workflow.

SAGE protocol

Second, you need to ligate your cDNA to tags:

Step 5: Next an oligonucleotide – either A or B – is added to each solution.  You can see oligo A and B being added in Image 1 after your sample is split into two samples. These A and B oligonucleotides have a few notable features: 1) An attachment site or “sticky ends” containing the anchoring enzyme cut site. These attachment sites when digested bind the cleaved cDNA. 2) A recognition site for another type of restriction enzyme called a tagging enzymes. 3) And a short primer sequence that can bind adaptor A or B (this will be used during the PCR step to follow). The adaptors A & B ligate to the cDNA.

Step 6: Now a tagging enzyme is used to cleaved the cDNA. This removes the cDNA from the beads to create a short “tag” of around 11 nucleotides (+4 nucleotides that correspond to the anchoring enzyme recognition site).

Step 7: These tags have sticky ends but are repaired using DNA polymerase (DNAP). This gives you blunt end fragments that are still bound to the adaptor primer-anchoring enzyme site-tagging enzyme site oligonucleotide.

Third, you ligate your tagged cDNAs together:

Step 8: Now it is time to ligate the blunt-end tags together to generate ditags with A and B adaptor ends. This string is then amplified by PCR using A and B primers.

Step 9: The anchoring enzyme is then used to cleave the ditags to remove the A and B oligonucleotides and allow the ditags to form long chains cDNA, called cDNA concatemers, where each ditag is separated by an anchoring enzymes recognition site.

Lastly, you transform, purify and sequence your ligated cDNAs:

Step 10: Then transform your concatemers into bacteria and allow the bacteria to replicate to form high quantities of your concatemers.

Step 11: The final step (Yay! You made it.) is to isolate your concatemers using your favorite protocol. Then use high-throughput DNA sequencing to quantify each individual tag. And create a gene expression profile for your original sample of mRNA-containing cells.

You are now the mage of SAGE! Hope that helps and keep your eye out for SAGE part 2, where I cover different types of SAGE and why they are useful.


Beyond SAGE

Since SAGE was first described over 20 years ago, several variations have come out. Here, we’ll look at three of those: LongSAGE, Robost-Long-SAGE and SuperSAGE. Each is an improvement on the last. The main difference if the throughput rate. In their 1995 paper in Science, Velculescu and colleagues concluded that their technique, SAGE, would take several months to determine transcripts expressed at greater than 100 mRNAs per cells (0.025%). At the time this speed was incredible but overtime this rate was just too slow prompting faster forms of the technique to be developed.

1. LongSAGE

The original SAGE technique could take 5 ?g of mRNA to create a library of hundreds of cDNA tags. In comparison, LongSAGE published in 2002 in Nature Biotech, could use 20 ?g of mRNA to create a library of thousands of cDNA tags.

In LongSAGE, 19-21 base pair tags are used to create concatemers. Since the snippets of the genes are longer (instead of 9-10 base pair tags in the original SAGE method), the odds of them occurring once in one genome the size of the human genome were calculated by the group >99.8%. This increase in accuracy as well as the ability to study larger segments took SAGE to the next level.

Downsides of this technique was that multiple restriction enzymes were required. Not all genes had restriction sites for one enzyme so the technique needed to be repeated with several enzymes. There was also a technical issues with cloning and purification that made the technique erratic in its reliability. This signaled the need for further improvements.

2. Robust LongSAGE

Robust LongSAGE or RL-SAGE was the next iteration of SAGE. The paper describing it came out in 2004 in Plant Physiology. Here Gowda et. al described four major areas of improvement when compared to LongSAGE:

  1. It requires a smaller amount of mRNA to build a library: 50 ng.
  2. Their use of enhanced cDNA adapter and ditag formation using a longer ligation period (overnight).
  3. Only needing 20 ditag polymerase chain reactions were to obtain a complete library (up to 90% reduction compared with the original protocols).
  4. Concatemers only had to be partially digested with a restriction enzyme before cloning into a vector greatly – improving cloning efficiency.

These improvements meant you could generate two to three libraries, each containing over 4.5 million tags, within one month! But scientists wanted to cut this further. Also, like Long-SAGE and SAGE, RL-SAGE made use of sticky ends to form the ditags. This results in some bias as the association isn’t random.

3. SuperSAGE

SuperSAGE in PNAS in 2003 went another step in speeding up the process and increasing its reliability. Here, 26 bp tags were created. The increased length of the tags meant even higher precision (an increase of about 10,000 times the accuracy of LongSAGE) so that the probability of there being two duplicate genes was practically impossible. The key here was the new restriction enzyme: type III-endonuclease EcoP15I of phage P1. This created blunt-ends rather than sticky ends thus ensuring the random association of two tags to form ditags.

The improved accuracy and increased speed meant infected cells and other interacting organism situations could be studied together at the same time without fear of confounding results. Additionally, isoforms of sequences could be found.

This technique has since inspired another variation: high-throughput (HT) SuperSAGE where next generation sequencing (NGS) is employed to analyze up to millions of tags at once! With the introduction of bench-top NGS, HT SuperSAGE is helping unravel some of the big questions in science, such as how do viruses affect the transcriptome profile of their host cells and to find new genes in species across the kingdoms.


You made it to the end—nice work! If you’re the kind of scientist who likes figuring things out without wasting half a day on trial and error, you’ll love our newsletter. Get 3 quick reads a week, packed with hard-won lab wisdom. Join FREE here.

I am a Clinical Research Coordinator at the U.S. Department of Veterans Affairs with a background in basic research, writing, mentoring and teaching. I studied Natural Science at Trinity College Dublin, Ireland, specializing in biochemistry with immunology and I am currently undergoing ACRP (Association of Clinical Research Professionals) certification. In my spare time, I enjoy studying HTML/CSS and SEO, doing acroyoga, making kombucha, salsa dancing, voluntary community projects and eating sushi. Feel free to send me a note with any writing opportunities or to say hello.

More 'DNA / RNA Manipulation and Analysis' articles

1-2-3 Newsletter

Get help with everything* lab-related.


*Well, everything except the washing up. That’s still on you.

10 Things Every Molecular Biologist Should Know

The eBook with top tips from our Researcher community.

Get practical lab wisdom like this in your inbox