DNA sequencing (PCR, Sanger or next-generation sequencing (NGS)) is a now familiar part of any molecular biology lab. But ‘RNA-seq’, the so-called “Cinderella of genetics”, is now becoming the belle of the ball, providing new insights into this most central molecule of the ‘central dogma’.
The many flavors of RNA
Whilst genomic DNA is the blueprint of any organism, it is the RNA species which defines the characteristics of a cell. Additionally, many RNA species carry out this function: mRNA transcripts carry amino acids to produce proteins by interacting with tRNA, rRNA interact with polymerases on the ribosome, miRNA and siRNA alter transcription and translation by working outside the ‘normal’ genome, as well as a host of other RNAs that might be able to code….for something.
Just a few years ago, the primary method used to discover how RNA worked was the microarray. Based on hybridization (you can download and read a great review of RNA-Seq here), microarray analysis constitutes a high-throughput approach. However, it requires previous knowledge of sequences, doesn’t detect novel events, and is an indirect measurement of an RNA sequence. Recently, NGS techniques have entered the field of RNA research, uncovering new sequences and furthering our understanding of RNA regulation. This knowledge could resolve questions of how the genome works, how life has evolved, and how to better identify and treat disease.
Non-coding versus coding
High throughput RNA sequencing (‘RNA-seq’) is more than just a faster, cheaper lab technique. It has created a new perspective of how RNAs work and even prompting fundamental questions of what constitutes a gene. Dr. Timothy Triche, a leading researcher at the Center for Personalized Medicine at Children’s Hospital of Los Angeles and the USC Keck School of Medicine, has demonstrated the utility of RNA-seq in transcriptome work. His lab started out using microarrays on mRNA. After expanding to arrays that looked at total RNA expression across the genome, his lab moved to adopt Ion Torrent NGS technology. “RNA-seq is well established, but it’s not yet well-known for transcriptome research,” Triche said. Although Triche’s research interest is in the genetics of Ewing’s sarcoma, his laboratory research has provided insights into how RNA-seq shows us the important differences between coding mRNA (Poly-A) versus total RNA in a sample, how to properly align NGS RNA data, and comparisons between RNA-seq and microarrays. His lab was the first to discover that non-coding RNA expression is biologically more important than coding RNA expression, especially in finding possible biomarkers that might in the future be connected to disease states. “90% of RNA is transcriptionally active,” said Triche. “The variety of non-coding transcripts is staggering.”
Obstacles to overcome
Whilst RNA-seq looks at transcription in an entirely new way, it’s not without challenges. Sequencing RNA is much more difficult than mapping DNA, with hundreds of millions of reads (of about 100 bp each), interrupted transcripts, and more obstacles to achieve alignments. In addition, many RNA-seq results can produce 90% ribosomal RNA (rRNA), which isn’t as scientifically valuable as other RNA species.
Removing the challenges
Using ribosomal RNA removal kits, however, his lab has been able to deplete rRNA and can get results with less than 3% rRNA contamination, without biasing resulting RNA. These removal kits are available from ThermoFisher’s Ion Torrent, Illumina’s EpiBio, Qiagen, and other manufacturers. Using this data, Triche has been able to characterize different isoforms in cancerous samples than in healthy samples, and discover other errors that occur at the RNA level in cancer cell samples which had never been seen before. Triche credits RNA-seq for the discovery of very long non-coding RNA (400 kb or more) which is expressed in tumor and metastasis samples and may possibly function as a single transcript.
What exactly is a gene?
RNA-seq work, for Triche, has also presented a fundamental biological question: what, after all, is a gene? If traditional definitions of genes dictate that it produces one transcript, then how do you define the end of an exon and beginning of an intron that regulates that exon? In addition, how does our ‘central dogma’ explain how transcription can extend beyond the coding boundaries of a gene? Triche emphasizes that, by using the right NGS tools, you can easily arrive at interpretable data to start answering these questions.