It took scientists a little while to warm up to long-read sequencing, but now you couldn’t pry most of them away from their sequencers with a crowbar. Long reads — we’re talking 10,000 bases and more — provide a level of contiguity and completeness in genome assemblies that simply isn’t possible with short-read sequencers. They can reveal full structural variants and accurately represent long, repetitive regions that flummox their short-read counterparts.
For example, scientists sequencing microbial genomes have discovered that they can often generate fully closed assemblies with long reads, representing the whole genome in a single contig. With more complex organisms, it’s not uncommon to hear about assemblies that have one contig to represent each chromosome. With short reads, assemblies are far more fragmented, split into hundreds or even thousands of small pieces that are difficult to place in the correct order and orientation.
There are two vendors in long-read sequencing today: PacBio and Oxford Nanopore Technologies. Others are waiting in the wings. For scientists using either of these platforms, they don’t want just long reads, they want the longest reads. And that’s where automated DNA size selection comes in.
Long-read sequencers are limited most by the length of the fragments fed into them. You can have a machine capable of producing 100,000-base reads, but if you load only 500-base DNA fragments, you can’t get the benefit of long-read data. In some cases, these sequencers preferentially sequence smaller fragments, so even if you had a mix of long and short fragments in your library, you’d wind up with much shorter average read lengths than the instrument is capable of producing.
Users of sequencers from both PacBio and ONT have shown that size selection can be used to remove the smaller fragments from a library prior to sequencing. This step may seem trivial, but studies show that it can double the average read length generated simply by focusing the sequencer on the longest DNA fragments available.
Here’s a great example from blogger Lex Nederbragt with nice data and charts. In a more recent study of the human genome, scientists from the Icahn School of Medicine at Mount Sinai and several other institutions reported the first diploid human genome sequence and noted that size selection was essential for maximizing read length. “Without selection, smaller 2000 – 7000 bp molecules dominate the zero-mode waveguide loading distribution, decreasing the sub-readlength,” the researchers noted in the supplementary materials.
At a recent ONT user group meeting, scientist and blogger Keith Robison reported that the company had begun using the BluePippin™ automated size selection platform to increase average read lengths; some users demonstrated the ability to enrich for reads at least 20 Kb long. At a PacBio user group event last fall, CSO Jonas Korlach introduced a protocol for generating libraries of at least 30 Kb by using the BluePippin with Diagenode shearing.
To learn more, check out the long-read sequencing resources listed here.
It may not be intuitive that a sample preparation step like DNA size selection would have a significant impact on downstream data analysis, but NGS users have proven that it does. Indeed, the precision of your size selection (or lack thereof) can make or break a genome assembly. Consider the alignment challenge for paired-end reads: […]
It’s great to have you in the Bitesize Bio family! We’ve sent you an email to confirm your registration. Please click on the link in the email or paste it into your browser to finalize your registration.
For more information on how to use Bitesize Bio, take a look at the following image (click it, for a larger version)
An error occured while registering you, please reload the page and try again