Squinting at a long list of significant genes from your latest RNA-seq experiment? Having trouble making sense of the results? You’re not alone.
Pathway analysis is becoming increasingly popular because it helps researchers make sense of complex data sets, including those obtained using next gen sequencing techniques. By systematically culling information about biological pathways and interactions from the literature and applying it to large datasets, it saves scientists countless hours and makes connections that would be hard for us mere mortals to see on our own.
Pathway Analysis Example
Perhaps an example is in order. Let’s say you are interested in what makes breast cancer tissue different from healthy tissue. You use RNA-seq to measure gene expression in both types of tissue and then run simple statistics to identify the genes that are differentially expressed. Although you are elated to end up with a list of 384 significant genes, none of the names seem familiar, and even after doing a few literature searches you are having a tough time seeing a pattern in your data.
You decide to plug your data into a pathway analysis program and discover that the differentially expressed genes are enriched in cell cycle and DNA damage repair pathways. The software also generates some great network diagrams that illustrate how your differentially methylated genes are biologically connected. A light bulb goes off in your head – you’ve discovered something new about breast cancer!
Almost like magic, right? Here’s a quick guide to getting started.
Commercially Available Software
Commercially available software may be the best option for you if you don’t feel comfortable using
Bioconductor (a collection of open source software for bioinformatics that works in R) or you value a user-friendly interface. In addition, professional support is available for commercial programs, should you hit a roadblock.
Ingenuity Pathway Analysis (IPA®)
Scientists familiar with Ingenuity’s user-friendly pathway analysis software for microarrays, IPA, will be happy to hear that
it works with RNA-Seq data as well. IPA provides gorgeous pathway/network figures, and the support offered by the company is top-notch. It, like most pathway analysis programs, is geared toward mammalian genomes and human disease research.
MetaCore™ and Pathway Studio™
Two other commercial options are also worth looking into:
MetaCore, from GeneGo/Thomson Reuters, and Elsevier’s
Pathway Studio. Of note: The latter software supports data on plants as well as humans, and add-ons related to pharmaceutical and
biomarker discovery are available.
Be sure to ask around to see if your institution offers free or low-cost licenses for these programs, as buying an individual license is quite expensive. Also, be aware that free trials are available for each of these programs. So go ahead and feel free to give them a spin before you commit! You may find that you prefer the user interface or graphics of one over another.
Open Source Software
What if your institution doesn’t have licenses available for the programs above, and you’re working on a shoestring budget? Or you prefer using open source software that you can customize? Or you are working on a project that has nothing to do with human disease? Don’t despair – there are some great, free alternative options.
GoSeq
One of the most popular open source programs is an R package called
GoSeq. Of special interest, this program takes into account the bias that differential transcript length may introduce into pathway analysis,
a known problem.
Reactome
If you’re working with a non-human/mammalian model system, you may want to give
Reactome a try. It includes data for
22 species, including
Drosophila, chicken, and rice. And unlike GoSeq, you don’t have to be familiar with Bioconductor
to use Reactome, which can be run via web browser.
What Are You Waiting For?
We’ve highlighted some of the most popular pathway analysis programs, but there are plenty of other options available as well. So ask colleagues and collaborators which programs they use and why, and check out the great discussions on the pros and cons of different methods on Biostars (
here’s one example).
It’s also a good idea to read up on the strengths and limitations of the available techniques, so that you can choose the best option for your particular dataset and avoid potential problems (
like using pathway analysis software built for expression data on methylation data). The links below can help with this.
You may even want to try running your data on a few different programs to see whether your results are consistent. Because different programs draw on different databases, they sometimes yield different findings.
Pathway analysis is a powerful technique that you can use to make the most of your data – and it is easy to get started. So happy pathway hunting!
Have some pathway analysis experience under your belt? Please feel free to leave comments and suggestions about pathway analysis below, to help out folks just getting started in this area.
Helpful Resources
- Khatri et al. 2012. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Computational Biology. 8(2): e1002375
- Smith, C. 2013. What can pathway analysis software do for you?
- Turner, S. 2012. Pathway analysis for high-throughput genomics studies.
Kristen has a PhD in Population Biology, Ecology, and Evolution, and a Master of Public Health in Global Epidemiology from Emory University.