It may not be intuitive that a sample preparation step like DNA size selection would have a significant impact on downstream data analysis, but NGS users have proven that it does. Indeed, the precision of your size selection (or lack thereof) can make or break a genome assembly.

Consider the alignment challenge for paired-end reads: bioinformaticians need to know how to space these reads to align them for an accurate assembly. With imprecise size selection, there may be too broad a range of fragment sizes, leaving bioinformaticians to puzzle out whether the reads should be 300 bases or 500 bases apart. Precise sizing, on the other hand, provides a reliable target not only for fragment size but also for downstream alignment.

In many studies, scientists have reported that automated, gel-based DNA size selection produces more precise, reproducible results compared to alternatives such as manual gels or beads. Some scientists create libraries with multiple insert sizes (say, 200-base, 300-base, and 400-base fragments) prior to paired-end sequencing. During alignment and assembly, having all that carefully sized data makes for a significantly higher-quality assembly.

Accurate size selection also makes a difference for downstream results of other types of sequencing. With mate-pair sequencing, for example, a team at RIKEN recently published an optimized protocol that incorporated automated DNA sizing and other modifications, resulting in longer scaffolds and improved gene coverage in the assembly1. The protocol also significantly reduced costs for mate-pair sequencing. In other mate-pair work, scientists have shown that improved sizing can decrease the incidence of chimeras and, therefore, the number of assembly errors they typically cause.

Further Reading:

  1. Tatsumi K, Nishimura O, Itomi K, Tanegashima C, Kuraku S. (2015) Optimization and cost-saving in tagmentation-based mate-pair library preparation and sequencing. Biotechniques. 58(5):253-7.
  2. Darren Heavens, Gonzalo Garcia Accinelli, Bernardo Clavijo, and Matthew Derek Clark. (2015) A method to simultaneously construct up to 12 differently sized Illumina Nextera long mate pair libraries with reduced DNA input, time, and cost. Biotechniques 59:42-45.
  3. Mate-pair sequencing app notes. Sage Science.
  4. Protocols. Sage Science.