One of the most powerful methods of modern cellular biology is creating and analyzing RNA libraries via RNA-sequencing (RNA-seq). This technique, also called whole transcriptome shotgun sequencing, gives you a snapshot of the transcriptome in question, and can be used to examine alternatively spliced transcripts, post-transcriptional modifications, and changes in gene expression, amongst other applications.
Unlike microarrays, RNA-seq does not depend on prior knowledge of the genome sequence. Therefore, researchers avoid needing any preconceived notions about what to detect (via probes or primers), and this decreases overall bias. RNA-seq is a type of Next Generation Sequencing (NGS) that provides an overview of the transcriptome, which includes mRNA as well as:
- alternative gene spliced transcripts
- post-transcriptional modifications
- gene fusions
- small RNAs, such as snoRNA, miRNA, rRNA, and so on
- ribosomal profiling
- changes in gene expression over time in one culture
- comparison of gene expression in control and experimental conditions.
Disadvantages of Using RNA-seq
RNA-seq typically begins with fragmentation followed by generation of cDNA. Unfortunately, this often came with a slew of complications. For instance, as scientists we know that in any experiment, each extra step means increased signal degradation and increased chances of sample contamination — and making cDNA requires several additional steps. Moreover, linker ligation for cDNA synthesis introduces bias because the linker does not ligate to different RNA end sequences with the same efficiency. cDNA synthesis also favors small and medium RNAs to the detriment of longer sequences.1
Nanopore Sequencing versus Adapter Ligation
There are two main methods of sequencing RNA libraries. The first is nanopore RNA sequencing, which measures the electric potential of each nucleotide as it passes through a pore. The advantages of this method are that there is no molecule length bias and no intermediate steps. However, the error rate of each reading is high, about 15% per single run. Limited throughput due to pores saturation means that stochastic sampling error may become significant. 2
The second method involves ligating adaptors to cDNA. The adaptors contain sequences that allow hybridization to a flow cell and subsequent sequencing. This method enables multiplexing, high throughput and flexibility of library types (see below) as well as accurate and precise library quantification.
Types of RNA-seq Libraries
There are several types of commercially available kits for generating RNA-seq libraries for the Illumina platform.
- Whole transcriptome sequencing: Strand-specific sequencing allows detecting known and novel transcripts. Even if you are interested in only a subset of your RNA, having a whole set of data for future reference is always welcome.
- Targeted transcriptome sequencing: Focuses on a specific transcripts of interest through either enrichment or amplicon-based techniques, as well as highly multiplexed amplification methods. The technique can detect up to 20,000 distinct RNA targets — and only requires 10 ng of starting material (RNA).3
- Small RNA sequencing: Specifically designed for detection of microRNA, short interfering RNA (siRNA), and piwi-interacting RNA. This method starts with size selection for small RNAs, ligation of selected fragments to adapters, and then generation of cDNA.4
New RNA Library Preparation Method
A new RNA library prep method creates cDNA—but only does so after adding the helper oligonucleotides that form adapters, The InvitrogenTM CollibriTM Stranded RNA Library Prep Kits for IlluminaTM systems avoid the trap of favoring small molecules and the high error rate of nanopore RNA sequencing.
Experimental Considerations for RNA Sequencing
When preparing to sequence your RNA, you need to consider certain parameters, because creating libraries is a time-consuming and high-stakes experiment.
Time dependency: Growth phase of your culture or developmental stage of experimental organism have different gene expression profiles. Make sure that you choose a time point where the maximum amount of your RNA of interest should be present.
Tissue specificity: RNA-seq does not discriminate between cell types — it covers cells across your sample. Therefore, you need to make sure that your sample is as close as you can get to only one cell type. Otherwise, you will have a mixed sample and confusing results.
Coverage: As with other high-throughput methods, the more abundant transcripts constitute the bulk of the RNA while in most cases you will be looking for rare events such as mutations. You need to make sure that there is sufficient redundancy of such rare transcripts to get enough data.6
Technical variance: To minimize the chance of artefacts that are caused by specific kits or sequencer used, be sure to use the internal controls that are usually included in the software package. Also, be sure to analyze latent variables, such as the probability of differential expression or dimension reduction scheme, to highlight groups of “signature” genes.7
While the genome is not necessary for the assembly of RNA-seq libraries, having it will make your life easier. You will first need to assemble short reads, and then find optimal alignment of your transcriptome data with the genome. You can use a program like Kallisto, GMAP, Bowtie, or many others available.
In the absence of an assembled genome, the de novo approach requires algorithms that assemble short sequences using overlapping nucleotides. That’s where the previously mentioned depth of coverage is critical because redundancy helps to avoid significant gaps and ambiguity in assembly.
Cutting Edge RNA-seq Techniques
While current techniques use tissue-level samples, there is an emerging application of RNA-seq to single cells. Taking this one step further, single-cell RNA-seq could even uncover different isoforms.9 This information could be especially important when studying pathology or normal physiology of highly heterogeneous cellular populations, like in cancer or during embryogenesis. Thus, RNA-seq has the potential to open a new world of information for you and can be easily tailored to your research questions.
Download our free infographic here:
- Liu D, Graber JH. (2006) Quantitative comparison of EST libraries requires compensation for systematic biases in cDNA generation. BMC Bioinformatics. 7: 77.
- Marinov G.K. (2017) “On the design and prospects of direct RNA sequencing”. Briefings in Functional Genomics, 16(6), 326–335
- Li W, Turner A, Aggarwal P, Matter A, Storvick E, Arnett DK, Broeckel U. (2015) Comprehensive evaluation of AmpliSeq transcriptome, a novel targeted whole transcriptome. RNA sequencing methodology for global gene expression analysis. BMC Genomics. 16(16):1069.
- Mehta JP. (2014) Sequencing small RNA: introduction and data analysis fundamentals. Methods Mol Biol. 1182:93–103.
- Martin DP, Miya J, Reeser JW, Roychowdhury S. (2016) Targeted RNA Sequencing Assay to Characterize Gene Expression and Genomic Alterations. J Vis Exp. 4(114).
- Li H et al. (2008) Determination of tag density required for digital transcriptome analysis: application to an androgen-sensitive prostate cancer model. PNAS. 105 (51): 20179–84.
- Stegle O et al. (2012) “Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses“. Nature Protocols. 7 (3): 500–7.
- Kingsford C, Patro R (2015). “Reference-based compression of short-read sequences using path encoding“. 31 (12): 1920–8.
- Ángeles Arzalluz-Luque and Ana Conesa (2018) “Single-cell RNAseq for the study of isoforms—how is that possible?”. Genome Biol. 19: 110.