Thermo Fisher Scientific Inc. (NYSE: TMO) is the world leader in serving science, with revenues of $17 billion and approximately 50,000 employees in 50 countries.
Our mission is to enable our customers to make the world healthier, cleaner and safer. We help our customers accelerate life sciences research, solve complex analytical challenges, improve patient diagnostics and increase laboratory productivity.
Through our premier brands – Thermo Scientific, Applied Biosystems, Invitrogen, Fisher Scientific and Unity Lab Services – we offer an unmatched combination of innovative technologies, purchasing
convenience and comprehensive support.
In whole genome sequencing (WGS) initiatives it is not enough to simply sequence the whole length of the genomic DNA sample just once. This is because genomes are usually very large. The human genome, for example, contains approximately 3 billion base pairs. Although sequencing accuracy for individual bases is very high, when you consider large genomes such as the human genome, even an error of 1 in 1,000 bases will result in 3 million erroneous base reads in the genomic data. Moreover, most often the goal of WGS efforts is to detect rare single nucleotide polymorphisms (SNPs) and point mutations in the genomic DNA. For example, various types of cancers1 and neurodegenerative diseases2 are driven by single nucleotide variants. To distinguish such biological variations in the genomic DNA from artefactual sequencing errors, it is important to increase the sequencing accuracy even further by sequencing individual genomes multiple times.
The number of times the entire genome or reference nucleotide landmarks are sequenced in a WGS initiative is called the coverage, read coverage, fragment count, or depth of sequencing. Whereas shallow or low coverage WGS refers to 0.1 to 0.2 x sequencing coverage and is useful in the detection of structural and copy number variations, deep sequencing that reads a whole genome sample approximately 30 times or more is crucial for the detection of single nucleotide variations (SNVs), including rare polymorphisms and point mutations, with high confidence. The high-throughput systems available from Illumina™ include the HiSeq™ series of sequencing systems which includes the HiSeq 2500, HiSeq 3000, HiSeq 4000, and HiSeq X systems, as well as the recent NovaSeq™ 6000 system. These sequencing systems are equipped to flexibly sequence a large variety of genomes at coverages suitable for the desired application.
Several studies show that high GC percentage in a genomic region results in low sequencing depth.3 This dependence of sequencing depth on the density of GC bases in a segment of DNA is described as GC coverage bias. Understandably, a high GC bias affects sequencing data quality scores and skews data interpretation, particularly when the analysis focuses on detecting rare SNVs, copy number variations (CNVs), or insertions and deletions (INDELS).
A variety of factors affect GC bias, including shearing mechanism of the DNA, ligation efficiency, and PCR amplification. For example, non-uniform physical or enzymatic shearing of DNA in library preparation protocols can result in fragment length bias. The method of tagging adaptors at the ends of DNA fragments can impact both the quality and quantity of the mapped reads.
Additional bias can be introduced when DNA fragments are amplified using PCR, as some DNA fragments in the library can get preferentially enriched over others during PCR amplification. This shows the crucial importance of the conditions of DNA library construction in introducing GC bias. Although PCR amplification is a major source of GC bias, improved PCR protocols using optimized conditions have reduced amplification bias. Optimization of thermocycler and temperature ramp rate, and increasing the duration of the denaturation phase to allow complete denaturation of the strongly coupled GC-rich regions were found to significantly improve evenness of coverage. PCR-based and PCR-free InvitrogenTM CollibriTM PS DNA Library Prep Kits for high-throughput Illumina systems provide the most even coverage for DNA input amounts ranging from 1 to 1000 ng. Moreover, in these kits, PCR can be considered optional because adaptor ligation does not require PCR amplification.
An essential aspect of optimizing a WGS workflow is to consider biases that may occur, and determine how to avoid or minimize their influence in the final data readout. With the development of the Invitrogen™ Collibri™ PCR-Free PS DNA Library Prep Kit for Illumina Systems and Invitrogen™ Collibri™ PS DNA Library Prep Kit for Illumina Systems, it is possible to reduce these biases.
- Macintyre G, Yistra B, and Branton JD. Sequencing structural variants in cancer for precision therapeutics. Trends Gen. 2016; 32(9): 530-542.
- Guo X, Qiu W, Garcia-Milian R, et al. Genome-wide significant, replicated and functional risk variants for Alzheimer’s disease. J Neural Transm (Vienna). 2017; 124(11):1455-1471.
- Lan JH, Yin Y, Reed EF. Impact of Three Illumina Library Construction Methods on GC Bias and HLA Genotype Calling. Immunol. 2015; 76(2-3): 166–175.