An introduction to RNA-seq

by on 25th of May, 2012 in Next Generation Sequencing

RNA sequencing (Wang 2009) is rapidly replacing gene expression microarrays in many labs. mRNA (and other RNA’s) are converted to cDNA that is used as the input to a next-generation sequencing library preparation. RNA-seq allows you to quantify, discover and profile RNAs. In this article, I’ll give a brief review of RNA-seq and introduce the major methods being used today.

Why is RNA-seq “better” than microarrays?

RNA-seq allows interrogation of lots more than just differential gene expression. Although there are microarrays available for exon-level and microRNA analysis, most users are still interested in basic, probably 3’ biased, differential gene expression. Microarrays are also biased, as we have to decide what content to place on the array. Since RNA-seq does not use probes or primers, the data suffer from much lower biases (although I do not mean to say RNA-seq has none). RNA-seq can be used to look at coding and non-coding RNA, at splicing and allele specific expression, and possibly soon at full-length cDNA sequences, eliminating the need to infer or assemble isoforms. As RNA-seq provides digital data in the form of aligned read-counts, it allows a very wide dynamic range, improving the sensitivity of detection for rare transcripts. It is also very cost-competitive to microarrays, as today, between 6-30 samples can be multiplexed in a single Illumina sequencing lane. Lastly, an RNA-seq dataset can be reanalysed as more information about the transcriptome becomes available. If a paper is published showing an interesting splice-variant in a similar system to the one you work on, then you might want to go back and look at that splicing in your samples; and you’d already have the data to do so.

How does RNA-seq work?

There are many methods for performing an RNA-seq experiment. In fact, the techniques are evolving so rapidly it can be difficult to decide which one to use. A basic choice is between 1) random-primed cDNA synthesis from double-stranded cDNA or 2) RNA-ligation methods (reviewed and compared in Levin 2010). Most people will use the first method and need to make a further choice between a strand-specific protocol and one that is not. The method used most in my lab is Illumina’s TruSeq RNA-seq, which is a random-primed cDNA synthesis non-strand-specific protocol. Once you have a sequencing library, it is sequenced to a specified depth, which is dependent on what you want to do with the data. These reads are aligned to the genome or transcriptome and can be counted to determine differential gene expression or further analysed to determine splicing and isoform expression. Most people are sequencing RNA using paired-end 50-100bp methods. The exception is microRNA sequencing, as this only requires single-end 36bp sequencing in most cases.

Our RNA-seq method

We use between 100ng to 1ug of total RNA as the input to an mRNA capture with oligo-dT coated magnetic beads. The mRNA is fragmented, and then a random-primed cDNA synthesis is performed. The resulting double-strand cDNA is used as the input to a standard Illumina library prep with end-repair, adapter ligation and PCR amplification being performed to give you a library that can now be submitted to whoever is performing the sequencing for you.

Why bother with strand information?

There has been lots of discussion about anti-sense transcription and its biological relevance. If you are interested in simple differential gene expression, then strand information will not add much to your experiment, but will make your protocol more complex. Having said that, the most widely adopted method can be performed in most labs without too much extra effort. During 2nd strand cDNA synthesis, Uracil is incorporated instead of Thymine. Illumina library prep continues as normal, but after adapter ligation and before PCR amplification, Uracil-DNA glycosylase is used to degrade the 2ndstrand. This results in all reads starting in the same orientation so you can determine which strand was being transcribed in your sample.

What can you actually do with RNA-seq?

RNA-seq is a powerful and versatile tool that has been published widely over the last few years. I have picked a couple of my favourites (some from work performed in the core facility I manage) to illustrate what you can do with RNA-sequencing.

  • Jabbari, et al. used RNA-seq to investigate psoriasis and find new genes for functional analysis.  They compared their RNA-seq data to published array studies and found 1700 new candiadates. These were validated by qPCR, and comparison to functional databases for psoriasis supported their role in pathogenesis.
  • Kutter, et al. used RNA-seq in a study looking at the conservation of RNA Polymerase III binding in mammals to validate expression of genes occupied by Pol III as assayed by ChIP-seq.
  • Mercer, et al. combined RNA-seq and microarray-based capture to identify and characterise rare transcripts, which are normally undetectable. The targeted transcripts increased in sequence read abundance from 0.21% pre-capture to 80% post capture. They found more than 200 previously unannotated isoforms for almost 50 protein-coding loci, including a new alternative isoform of TP53, which is a very well characterized gene. This suggests that there is still much complexity in the genome and transcriptome to be resolved.

In summary, RNA-seq is still an evolving tool, but is preferable in most instances to microarrays. It is more sensitive, more robust and can be more cost effective. What RNA-seq projects are you now planning for your project?

References:

Jabbari et al: Transcriptional Profiling of Psoriasis Using RNA-seq Reveals Previously Unidentified Differentially Expressed Genes. Journal of Investigative Dermatology 2011.

Kutter et al: Pol III binding in six mammals shows conservation among amino acid isotypes despite divergence among tRNA genes. Nature Genetics 2011.

Levin et al: Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nature Methods 2010.

Mercer et al: Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nature Biotechnology 2012.

Wang et al: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009.

About the author: James Hadfield
James received a Biology degree from the University of East Anglia in 1995. His career so far has leant towards technology development or implementation. In 1995 he developed a differential PCR test for ErbB2 copy number and over the last 16 years he's worked at; the Norfolk & Norwich Hospital, Royal London on Diabetes genetics, the Cambridge Uni Department of Pathology and the John Innes Centre on Wheat disease resistance gene cloning and arrays. In 2000 he set up an Affy and spotted microarray facility at JIC, he co-founded the UK Affy user group, which is still going strong. Whilst at JIC he also won a Biotech competition, and hopes one-day to start a business although none of his ideas have come to anything yet! In 2006 James moved to set up the genomics facility at CRI. The lab offers broad spectrum genomic services for scientists at CRI and Illumina next-gen sequencing for CRI, Gurdon, LMB and Plant Sciences. His interests today are firmly in next generation sequencing and development of the technology for personalised medicine. He also writes the Core Genomics blog, commenting on the exciting and fast moving world of Genomics. With a focus on next-generation sequencing and microarray technologies, although it is does go off on tangents from time-to-time. Publications by James Hadfield

See more from James Hadfield Visit their website

Speak Your Mind

Next Generation Sequencing