The efficiency of whole genome sequencing (WGS) workflows has skyrocketed since its inception. Major leaps and minor tweaks in the WGS workflow have compounded over time resulting in radical reductions in processing time and the cost of sequencing whole genomes over the past decades. The complete sequencing of the first human genome, named the Human Genome Project, announced amidst much fanfare in 2003, cost about $1 billion and took 13 years to complete. This global effort was spearheaded by the National Human Genome Research Institute (NHGRI) and involved scientists from 20 institutions in 6 countries. Aided by astounding improvements in sequencing technologies and accessibility, in 2007 the cost of sequencing the whole genome had reduced to approximately $10 million. In 2015, WGS costs had further dropped to about $3,000 per genome and took just a couple of days to sequence an entire genome.
In a recent interview with The San Diego Union Tribune, IlluminaTM chief executive Francis DeSouza said that the company’s NovaSeqTM line is projected to reduce the cost of sequencing to $100 per human genome within the next few years. This is an impressive improvement over the company’s 2014 cost stipulation of $1000 per genome. Modern library prep kits, such as the InvitrogenTM CollibriTM PCR-Free PS DNA library prep kit for Illumina systems make it possible for library generation to keep pace with instrument speeds. The Collibri PS DNA Library Prep Kit requires only 0.4 hour hands-on time and a total of 1.5 hours.
Key Steps in the WGS Workflow
Identify the Integrity of your Input
It’s important to start with high-quality genomic DNA to optimize the library construction for WGS and to increase the chances of sequencing success. Most laboratories quantify the purity of the input DNA by measuring the ratio of the absorbances at 260 and 280 nm, which should ideally range between 1.8 and 2.0 for clean high-quality DNA. Running the input DNA through gel electrophoresis helps identify impurities in DNA (such as detergents used in the isolation of DNA) or DNA damage and shearing that occurred during the isolation or storage phases. A single band on the gel represents intact high-quality DNA, whereas degraded DNA presents as a smear of variously sized DNA fragments. RNA contamination in DNA appears as a blurry band at the bottom of the gel. However, better still than ultraviolet spectrometry-based absorbance measurements is the fluorescence-based detection of the purity of DNA. This is because the presence of RNA contamination can increase the absorbance ratio leading to over-estimation of the quality of the DNA sample.
Construct a Library
Preparing a DNA library from the genomic sample at hand is a crucial step in the WGS workflow. Essentially, a DNA library must be prepared to convert the nucleic acid sample collected into a form that is in sync with the specifics required for the particular sequencing system used downstream.
Preparing a DNA library usually involves the following steps:
- fragmenting the long strands of genomic fragments into optimal sizes for sequencing;
- blunting and 5’-phosphorylation of DNA ends;
- attaching a dAMP at the 3’ end (dA-tailing) to prevent fragments from joining end to end (concatemer formation);
- attaching or ligating detectable adaptor oligonucleotides at the 3’ and 5’ ends of the genomic DNA fragments;
- converting any single-stranded genomic DNA into double-stranded DNA
- optional: using PCR to amplify the library obtained from limited amount of input;
- quantitating the library for the specific sequencing application.
The downstream platform determines the length of the nucleic acid fragments. For example, Illumina’s HiSeq 2500TM requires an average DNA fragment length of 600 bp or smaller whereas the HiSeq 3000TM requires an average DNA fragment length not exceeding 350 bp. Physical or enzymatic methods can be used to achieve DNA fragments of the desired size. Physical methods commonly used include acoustic shearing, sonication, and hydrodynamic shearing. Enzymatic methods consist of digestion with non-specific endonucleases or their cocktails. Another enzymatic alternative is to use a transposase enzyme that fragments double-stranded DNA (dsDNA) and simultaneously adds an adapter segment to the dsDNA, considerably reducing sample handling and processing time.
The Efficiency of the Improved WGS Workflow
The Invitrogen Collibri PCR-Free PS DNA library prep kit generates DNA libraries up to 50% faster than other comparable kits through the use of master mixes, effective enzymes, and streamlined protocols. Cutting down considerably the time required for adapter ligation and index tagging, the library construction using Invitrogen Collibri PCR-Free PS DNA library prep kit total workflow requires only a total of 6.5 hours: 1 hour for DNA extraction, 0.5 hours for DNA quantification, 1 hour for DNA fragmentation, 1.5 hours for library preparation, 0.5 hours for library quantitation, and 2 hours for manual normalization and pooling.
Assessing success not just at the end of the workflow but at every possible intermediate step is important in the WGS workflow. Each step in the Invitrogen Collibri PCR-Free PS DNA library prep kit includes an in-process visual quality control check, ensuring high success rates. For example, the use of tracking dyes can help identify individual samples in 96-well plates that have missed a particular reagent or have been inadequately mixed.
Significance and Application of WGS
WGS is useful for analyzing rare disease clusters, testing causative links between potential pathogens and diseases, and drug target discovery. Recently, WGS has been used for root cause analysis and novel gene discovery.1,2 With the increased efficiency, ease-of-use and affordability of the latest sequencing systems, the use of WGS will increase in biological research.
- Peace CP, et al. Apple whole genome sequences: recent advances and new prospects. Hortic Res. 2019 Apr 5;6:59.
- US Food & Drug Administration. Whole Genome Sequencing (WGS) Program. February 14, 2018.