RNA sequencing (Wang 2009) is rapidly replacing gene expression microarrays in many labs. mRNA (and other RNAs) are converted to cDNA that is used as the input to a next-generation sequencing library preparation. RNA-seq allows you to quantify, discover and profile RNAs. In this article, I’ll give a brief review of RNA-seq and introduce the major methods being used today.
Why is RNA-seq “better” than microarrays?
RNA-seq allows interrogation of lots more than just differential gene expression. Although there are microarrays available for exon-level and microRNA analysis, most users are still interested in basic, probably 3’ biased, differential gene expression. Microarrays are also biased, as we have to decide what content to place on the array. Since RNA-seq does not use probes or primers, the data suffer from much lower biases (although I do not mean to say RNA-seq has none). RNA-seq can be used to look at coding and non-coding RNA, at splicing and allele specific expression, and possibly soon at full-length cDNA sequences, eliminating the need to infer or assemble isoforms. As RNA-seq provides digital data in the form of aligned read-counts, it allows a very wide dynamic range, improving the sensitivity of detection for rare transcripts. It is also very cost-competitive to microarrays, as today, between 6-30 samples can be multiplexed in a single Illumina sequencing lane. Lastly, an RNA-seq dataset can be reanalysed as more information about the transcriptome becomes available. If a paper is published showing an interesting splice-variant in a similar system to the one you work on, then you might want to go back and look at that splicing in your samples; and you’d already have the data to do so.
How does RNA-seq work?
There are many methods for performing an RNA-seq experiment. In fact, the techniques are evolving so rapidly it can be difficult to decide which one to use. A basic choice is between 1) random-primed cDNA synthesis from double-stranded cDNA or 2) RNA-ligation methods (reviewed and compared in Levin 2010). Most people will use the first method and need to make a further choice between a strand-specific protocol and one that is not. The method used most in my lab is Illumina’s TruSeq RNA-seq, which is a random-primed cDNA synthesis non-strand-specific protocol. Once you have a sequencing library, it is sequenced to a specified depth, which is dependent on what you want to do with the data. These reads are aligned to the genome or transcriptome and can be counted to determine differential gene expression or further analysed to determine splicing and isoform expression. Most people are sequencing RNA using paired-end 50-100bp methods. The exception is microRNA sequencing, as this only requires single-end 36bp sequencing in most cases.
Our RNA-seq method
We use between 100 ng to 1 µg of total RNA as the input to an mRNA capture with oligo-dT coated magnetic beads. The mRNA is fragmented, and then a random-primed cDNA synthesis is performed. The resulting double-strand cDNA is used as the input to a standard Illumina library prep with end-repair, adapter ligation and PCR amplification being performed to give you a library that can now be submitted to whoever is performing the sequencing for you.
Why bother with strand information?
There has been lots of discussion about anti-sense transcription and its biological relevance. If you are interested in simple differential gene expression, then strand information will not add much to your experiment, but will make your protocol more complex. Having said that, the most widely adopted method can be performed in most labs without too much extra effort. During 2nd strand cDNA synthesis, Uracil is incorporated instead of Thymine. Illumina library prep continues as normal, but after adapter ligation and before PCR amplification, Uracil-DNA glycosylase is used to degrade the 2ndstrand. This results in all reads starting in the same orientation so you can determine which strand was being transcribed in your sample.
What can you actually do with RNA-seq?
RNA-seq is a powerful and versatile tool that has been published widely over the last few years. I have picked a couple of my favourites (some from work performed in the core facility I manage) to illustrate what you can do with RNA-sequencing.
- Jabbari, et al. used RNA-seq to investigate psoriasis and find new genes for functional analysis. They compared their RNA-seq data to published array studies and found 1700 new candiadates. These were validated by qPCR, and comparison to functional databases for psoriasis supported their role in pathogenesis.
- Kutter, et al. used RNA-seq in a study looking at the conservation of RNA Polymerase III binding in mammals to validate expression of genes occupied by Pol III as assayed by ChIP-seq.
- Mercer, et al. combined RNA-seq and microarray-based capture to identify and characterise rare transcripts, which are normally undetectable. The targeted transcripts increased in sequence read abundance from 0.21% pre-capture to 80% post capture. They found more than 200 previously unannotated isoforms for almost 50 protein-coding loci, including a new alternative isoform of TP53, which is a very well characterized gene. This suggests that there is still much complexity in the genome and transcriptome to be resolved.
In summary, RNA-seq is still an evolving tool, but is preferable in most instances to microarrays. It is more sensitive, more robust and can be more cost effective. What RNA-seq projects are you now planning for your project?