Library Preparation

Improving Library Preparation for Next Generation Sequencing

Library quantity and quality are paramount to the success of a Next Generation Sequencing (NGS) experiment. But there are a lot of factors that impact library yield and quality: conversion rate, GC coverage, low quality or low quantity input material, and PCR-introduced bias. All that, and wouldn’t a shorter protocol be nice too? Whether you are purchasing a library prep kit, or putting together a protocol yourself, take the time to consider and analyze the factors that dictate the success of your NGS library preparation. Here, we examine the performance of New England Biolab’s NEBNext® Ultra TM II DNA Library Prep Kit.

Improving Library Preparation Quality

Generation of high diversity libraries can be increasingly more difficult to achieve at lower input levels, and requires high efficiency of library construction. The efficiency of end repair, dA-tailing and adaptor ligation can be assessed separately from the PCR library amplification step by using qPCR to quantitate adaptor-ligated fragments in unamplified libraries. Comparing this to quantitatifon of input fragments allows you to determine the rate of conversion of input DNA to adaptor-ligated fragments (sequenceable molecules). An efficient rate of conversion means higher yields with fewer PCR cycles, reduced workflow time, reduced risk of bias introduction and, overall, a library that is more representative of the sample input. The conversion rates for NEBNext Ultra II DNA Library Prep Kit for Illumina® (Ultra II), Kapa™ Hyper Prep Kit for Illumina (Kapa Hyper), and TruSeq® Nano DNA Library Kit (TruSeq) were determined for varying quantities of input DNA (figure 1).

Library Preparation

Figure 1. Conversion Rate Measured by qPCR. Libraries were prepared using NEBNext Ultra II DNA Library Prep Kit for Illumina (Ultra II), Kapa Hyper Prep Kit for Illumina (Kapa Hyper), or TruSeq Nano DNA Library Preparation Kit (TruSeq Nano) from Human NA19240 genomic DNA using the input amounts indicated without an amplification step, and following the individual manufacturers’ recommendations. qPCR was used to quantitate adaptor-ligated molecules, and quantitation values were then normalized to the conversion rate of the Ultra II kit.

Regardless of the GC content or the amount of the input DNA, a high quality library needs to uniformly represent the original sample with even coverage across the GC spectrum. Uniformity of GC coverage can be assessed by choosing samples representing varying degrees of GC content. Three microbial genomic DNAs offer good perspective of low, medium and high GC content: H. influenza (38% GC), E. coli (51% GC), and R. palustris (65% GC), respectively (Figure 2).

Ultrall

Figure 2. Ultra II GC Coverage. Libraries were made using input genomic DNA as indicated and the NEBNext Ultra II DNA Library Prep Kit and sequenced on an Illumina MiSeq®. Reads were mapped using Bowtie 2.2.4 and GC coverage information was calculated using Picard’s CollectGCBiasMetrics (v1.117). Expected normalized coverage of 1.0 is indicated by the horizontal grey line, the number of 100 bp regions at each GC% is indicated by the vertical grey bars, and the colored lines represent the normalized coverage for each library.

An ideal library will completely and proportionally represent the input DNA. If library preparation is inefficient, or if the input amount of DNA is very low, there is an increased risk that the resulting library will lack diversity and that some regions of the DNA will be over- or under-represented. The effects of input amounts on library diversity can be measured by comparison of the level of sequence coverage (in 10 kb intervals) achieved with libraries produced from different input amounts (data not included here).

Improving Yield

Obtaining sufficient yields for high quality cluster generation and sequencing from very low input amounts can be challenging, and can be complicated by the preference to amplify the library using as few PCR cycles as possible. Minimizing PCR cycles is desirable primarily because it reduces the risk of introducing bias during PCR. GC-rich and AT-rich regions can be amplified less or more efficiently than other regions of DNA. Also, especially when input amounts are low, amplification of a small percentage of molecules can be favored, leading to high library yields but low diversity of content. An additional benefit is that use of fewer PCR cycles reduces workflow time. So choose a kit or reagents that maximize final library yield, with minimal PCR cycles. Final library yields were measured for Ultra II, Kapa Hyper, and TruSeq kits for varying quantities of input DNA and PCR cycles (Figure 3).

NEBNext

Figure 3. Library Yield Determined by Input Amount and PCR Cycles. Libraries were prepared from Human NA19240 genomic DNA using the input amounts and number of PCR cycles indicated. Manufacturers’ recommendations were followed, with the exception that size selection was omitted.

When amplification is necessary to obtain sufficient library yields, it is especially important to monitor evenness of GC coverage, to ensure that representation of GC-rich and AT-rich regions are not skewed in the final library. Comparing libraries produced with amplification to those produced without amplification (PCR-free), can be a useful way to measure the introduction of bias via PCR amplification. Ideally, the GC coverage of an amplified library will overlay very closely with a PCR-free library.

Hybridization-based target enrichment workflows can be particularly demanding in terms of library yield as large amounts of library, in the microgram range, are often required as input into enrichment. Obtaining high quality libraries at these quantity levels can be challenging, especially when the original samples are only available in very small amounts. In order to avoid use of a large number of PCR cycles, for enrichment workflows be sure to choose a kit or protocol that offers high efficiency of end repair, dA-tailing and adaptor ligation, with minimal PCR amplification requirements.

Whether you are purchasing a library preparation kit for your NGS experiment, or gathering reagents and designing your own protocol, it is important to consider the factors that determine the quality and quantity of your final library. The challenges to creating a high quality library can be overcome by ensuring an efficient rate of conversion and uniform coverage across the GC content spectrum. These factors allow limited PCR amplification, and therefore reduced risk of bias introduction, while enabling high quantity library yields even with minimal input DNA.