Skip to content

Variations on the ChIP-seq Theme and Challenges of Befriending Large Datasets

Posted in: Genomics and Epigenetics
Variations on the ChIP-seq Theme and Challenges of Befriending Large Datasets

ChIP-seq has proved amazing. Through these new techniques, we can obtain big datasets in a matter of days, making our lives in the lab easier and more efficient.

ChiP-seq combines chromatin immunoprecipitation (ChIP) assays with whole genome sequencing. This makes it possible to understand where proteins bind to DNA and epigenetic modifications. Humans are not only their genome but also the epigenome, after all. Unlike arrays and other approaches used to investigate the epigenome, which are inherently biased because they require probes derived from known sequences, ChIP-Seq does not require prior knowledge!

The ChIP-Seq Variations

Nowadays there are some innovative techniques to analyze the epigenome giving us different insights and information.

  • Classic ChIP–seq reveals binding sites of specific transcription factors (TFs). In ChIP–seq, you use specific antibodies to extract DNA fragments bound to the target protein, either directly or through other proteins in a complex containing the target factor.
  • DNase-seq, Assay for Transposase-Accessible Chromatin-seq (ATAC-seq), and Formaldehyde-assisted Isolation of Regulatory Elelments–seq (FAIRE-seq) reveal regions of open chromatin, not associated with any protein
    • In DNase-seq, the DNase I endonuclease fragments the chromatin. Then, these fragments are selected by size and enriched.
    • ATAC-seq is an alternative method to DNase-seq that uses an engineered Tn5 transposase to cleave DNA and tag it with specific primers– a process called tagmentalion.
  • And MNase-seq identifies specifically positioned nucleosomes(1-4). Micrococcal nuclease (MNase) is an endo–exonuclease that progressively digests DNA until an obstruction, such as a nucleosome, is reached.

All of these techniques are useful in their own way—and can result in millions and millions of reads. Analyzing this big set of data can be a big headache!

The Interpretations of Large Datasets

To obtain good results in these assays there are important factors to keep in mind:

  • the quality of your products (be it antibody, enzyme, or what you are using)
  • the amount of sample you input, as a higher concentration (or low one) may bias your experiment (and can indeed render your experiment useless)
  • the depth of sequencing
  • sample number and number of replicates.

And, of course, a control should always be present in your experiment. Believe me, you will thank me for this when you are analyzing your data.

A ChIP-seq experiment may produce millions and millions of short reads (depending on the organism and the experiment in itself). So, after the experimental setup, you will need to analyze this massive amount of information. After data quality checks like FASTQC, the first step is to align your information to the reference genome using a standard tool like Bowtie or BWA. You will obtain a “profile” of your reads, and you can then upload your alignment in browsers such as UCSC Genome Browser or IGV. Your data should have “peaks.” These peaks are your signal, the enriched sequences that were amplified. The challenge is to call peaks correctly. Could a peak be an artifact? A product of a repetitive sequence? Derived from a GC-enriched section? Or is it some other bias, an artifact specific to the practical experiment in itself?

Normalize Your Results

We need to normalize our results to get the real peaks. You can use different algorithms to detect peaks. Nevertheless, these methods often require careful discernment of several parameters to obtain good results. The choice of which algorithm to use must be well thought out. You should have in mind what question you are trying to answer. Remember, different algorithms can provide different results (even when applied to the same data). The real challenge comes from the lack of benchmark data-sets that makes it even harder to analyze your results. So, sometimes it becomes essential to apply several methods to your data as peaks remaining independent of the method applied are more likely to be real signals.

Tips for ChIP-Seq

Here are some takeaway tips to make your ChIP-seq life easier:

  • Use control groups in each experiment. These groups underwent the same experiment as our samples. Therefore, we can compare their profiles and use the comparison to address the fact that reads are not uniformly distributed
  • Deeper sequencing will also improve ChIP-seq performance, as long as you have a control to compare it with
  • And visualization is very important – no matter how technological everything gets, when you can see your alignments it gets much easier!


  1. Jason D Buenrostro, Paul G Giresi, Lisa C Zaba, Howard Y Chang William J Greenleaf (2013). Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature Methods 10: 1213-1218, doi:10.1038/nmeth.2688
  2. Jeremy M Simon, Paul G Giresi, Ian J Davis, Jason D Lieb (2007). Using formaldehyde-assisted isolation of regulatory elements (FAIRE) to isolate active regulatory DNA. Nature protocols 7: 256-267 Doi:10.1038/nprot.2011.444.
  3. Kairong Cui and Keji Zhao (2012). Genome-wide approaches to determining nucleosome occupancy in metazoans using MNase-Seq Methods Mol. Biol. 833: 413-419. Doi: 10.1007/978-1-61779-477-3_24
  4. Lingyun Song and Gregory E. Crawford (2010). DNase-seq: A High-Resolution Technique for Mapping Active Gene Regulatory Elements across the Genome from Mammalian Cells. Cold Spring Harb. Protoc. Doi: doi:10.1101/pdb.prot5384
Share this to your network:

Leave a Comment

You must be logged in to post a comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll To Top