It’s the hot new technique. With a single procedure, you can get information about all RNA transcripts at once! It sounds like a dream.
While RNA sequencing (RNA-seq) has opened the door to exciting new questions, scientists interested in pursuing this technique should be aware of the roadblocks ahead of them. While RNA-seq can be used in many different applications, this article focuses on the use of RNA-seq to look at differential expression between samples.
RNA Degradation and Purity
The first issue may not thwart you, depending on your background. As the name RNA-seq implies, you are working with RNA. Unlike DNA, RNA will degrade rapidly if you don’t treat it carefully. It’s made to be a temporary signal within the cell.
Therefore, you must be ever vigilant of the dreaded RNases, enzymes that break down RNA. When RNA isn’t stored at super cold temperatures, these enzymes get to work, breaking down your precious RNA. Even when cold, they eat away at your RNA, just more slowly. Some people immediately douse themseves in RNase Away or RNaseZap like holy water if someone utters “RNase” in the lab, but there are several ways to keep your RNases in check.
RNA purity and quality are major concerns in RNA-seq, which is incredibly sensitive. If you’ve done RT-qPCR before, then you know the drill. You want really pure, high quality RNA. Remember, crap in = crap out. If you have a bioanalyzer, you can use it to determine how awesome (or meh) your RNA is. For RNA-seq, you want a RIN (RNA integrity number) as close to 10 as possible (generally >8 is considered high-quality). You may need to optimize your RNA extraction protocol to get RNA that’s up to par.
Choose Your Own Adventure
Okay so you know you can get stellar RNA and you’re ready to do some RNA-seq. Right? Not quite. You have a lot more to figure out first. This is perhaps the most challenging aspect of RNA-seq. Since RNA-seq is so brand spankin’ new, there is no best practices guide. Researchers are working hard to figure out the ins and outs, and benefits and limitations, of all of the different procedural choices available to you. Not to scare you, but here’s a list of some of the decisions you’re going to have to make – before you even get your RNA.
- Choice of technology/platform – Which platform will you use (Illumina, Ion Torrent)? What machine within that platform will you use (MiSeq, HiSeq, PGM)?
- Sample selection – Which samples will you choose to analyze? What best exemplify your experimental question?
- Number of biological replicates – What tools/strategies do you have for estimating your variance (basically doing a power analysis to see how many replicates you need)?
- RNA isolation and quality control – How will you minimize variation in isolation and storage between samples? How will you check the purity and quality of your samples?
- Enrichment of desired transcripts – Which of a number of strategies will you use to remove highly expressed transcripts (like ribosomal and mitochondrial RNA) to reveal the transcripts you care about?
- cDNA synthesis – What priming method will you use to make cDNA from your RNA?
- Library prep – Will you use stranded or unstranded library construction methods? Which adapters will you use? Will you need to add additional tags? How will you reduce batch variance?
- Sequence read lengths – How many bases will you sequence from either a single end or each end of a transcript?
- Sequencing depth – How many reads will you need to give you an accurate picture of your transcriptome?
- Paired vs. single-end reads – Will you sequence from one end of the transcript or two?
- Trimming – How will you perform adapter and quality trimming?
- Alignment vs. assembly – Will you map your reads onto a reference genome or make contigs?
- Sequencing quality control – How will you interrogate your data for several indicators of quality concerns?
- Normalization – How will you normalize your data for depth, GC content, and composition?
- Estimating abundance – Will you use a count based method or will you use an abundance measure FPKM (fragments per kilobase of transcript per million mapped reads)? Will you only count uniquely mapped reads or include multiple mapped reads?
- Tests for differential expression – What computing tool will you use to statistically test for differential expression?
- Interpretation, summarization, visualization – After all of that, how will you visualize your data and draw conclusions on the biological relevance of your results?
Okay. Breathe. I felt the same way looking at this list the first time. With a little bit of thought and research, you can find the answers to each one of those questions. Once you’ve figured out the pros and cons of each option and choose the best one, then you’re well on your way to having a solid RNA-seq strategy.
While daunting, I cannot emphasize enough how important a sound experimental design is. Every expert that I’ve talked to says that plowing ahead without a strategy is the number one mistake you can make with RNA-seq. Each RNA-seq strategy is completely dependent on the particular question you want to answer. There are many choices, none of them perfect. You simply need to choose the best for your project and be cognizant of what the limitations of your choices are. Do your homework. Check out forums. Spend time considering your choices for each element in the design. There’s actually a ton of support out there for you. I highly recommend reading the sources I cited for this article. My own institution provides RNA-seq experimental consulting.
Speaking in Code
Now that you can figure out how to plan your RNA-seq experiment, there’s one more thing I have to hit you with. You’re likely going to have to learn how to code.
I don’t mean sending signals to your friend at a party to tell them you’re ready to get outta there. I mean hardcore computer program language. This was the biggest surprise to me as I started to look at RNA-seq. I went to a workshop on next gen sequencing and we spent three days learning how to code and use the computing systems at my institution.
The thing is, the raw data produced by RNA-seq is HUGE. Then you need to take that huge data and manipulate it (trim, align, run statistical tests), which takes a LOT of computing power and time. You can’t do it without a computing cluster (a bunch of computers all hooked up to each other) and you can’t access computing clusters without speaking the cluster’s language – code. The good news is, if your institution has a sequencing core or you’re using someone else’s sequencing instrument, they already have something set up to do all of this and can help walk you through it. Don’t know if your institution has a computing cluster you can use? Good news! Some researchers are busy trying to set up cloud-based RNA-seq computing. Check it out in the resources section at the bottom of this article.
To Seq or Not to Seq…
RNAseq is quite a challenge. But, isn’t that what science is about anyway – forging into the unknown and unexplored. Any challenge, including RNA-seq, can be overcome with the right amount of support and resources. We can do this.
- Williams AG, et al. (2015) RNA-seq data: challenges in and recommendations for experimental design and analysis. Current Protocols in Human Genetics. 83:11.13.1-11.13.18.
- Griffith M, et al. (2015) Informatics for RNA sequencing: a web resource for analysis on the cloud. PLOS Computational Biology. 11(8): e1004393.
- Hirsch CD, et al. (2015) Genomic limitations to RNA sequencing expression profiling. The Plant Journal. 84: 491-503.
- Cloud Computing Tutorial for RNA-seq
- bioinformatics.ca: Informatics for RNA-sequence Analysis (QC) (2014) – Workshop Materials for Free!