The microorganisms that we know and understand today are the ones which either cause human diseases, or are beneficial to human society in some way. From wine and cheese in food industry to the pharmaceutical industry- they are an indispensable part of our lives.
Despite making good progress in understanding the microbial world, not every microbe could be identified and analyzed, because studying them extensively depended on the ability to make a “pure culture” or single clones. Pure cultures enable DNA extraction, understanding the pathology and the underlying molecular mechanisms of that particular organism.
Next generation sequencing transcends this boundary. Metagenomics is the sequencing of an environment rather than DNA extracted from a single organism.
We can sequence anything!
We can basically sequence anything. Soil, marine life, human skin, gut microflora etc. The list is endless and gives us the flexibility to study new microbial communities and apply this knowledge to fields such as medicine, agriculture and biofuels.
How does Metagenomics work in identifying new microbes?
The sequencing data derived from metagenomics is essentially the same as genomic data, but piecing it back together is different. There could be innumerable organisms in the sample, with no existing data to depend on. So the first step is identifying new organisms.
A slow rebuild
Sequencing the gene that codes from ribosomal RNA or rDNA sequencing is the way to go for identifying organisms. 16s rRNA helps to identify the various taxa present in the sample. Individuals of the microbial community are slowly rebuilt relying on the 16s rRNA conservation.
The second part is assembling the individual microbial genomes. This can be very challenging since metagenomic data tends to have much lower coverage than regular genomic data. But additional WGS helps to identify genes which code for exotic proteins, understanding composition, evolution and comparing metagenomes.
Typically, it takes different experimental designs and several rounds of sequencing to obtain reliable metagenomic information since the data is extremely redundant. Whole genome sequencing can provide some valuable information in addition to rRNA sequencing, but it is highly dependent on pre-existing databases and is only as good as the assembler used.
What are the steps and challenges involved in sample preparation?
The three basic steps are; (i) extracting DNA from the sample; (ii) making libraries and (iii) sequencing the sample – similar to other sequencing projects. However, sample preparation can be particularly challenging. The DNA present is usually highly fragmented which makes it difficult to get libraries with longer read lengths. Longer read lengths are preferable since they are more reliable in identifying a new organism. Furthermore, because of the nature of the environment sampled, several inhibitory compounds may be present. The sample may have a high salt concentration or unfavorable pH. GC content is another big drawback- it can have a great influence on the results leading to a biased interpretation.
What platforms can I use for a metagenomics project?
The platform used for sequencing is an important consideration. Longer read lengths are favorable and the Roche 454 instrument was the top choice for metagenomics. Nowadays, the 454 flx is capable of up to 1000 bp read length and is preferred over the 454. Illumina was providing short read lengths until now but this is changing with Miseq offering up to 2×250 bp read lengths.
Read length, reads or replicates?
Deep sequencing of 16s rRNA helps in identifying rare genes and taxa. In addition, it is important to increase the sample number sequenced to make statistical significance out of patterns that emerge. Therefore, it is important to analyze the platforms available and decide a tradeoff between read lengths, number of reads and sample number depending on both the budget of the project and which stage you are at with your experiments!
What makes a successful metagenomics project?
Unlike certain genomic or transcriptome projects, the first experiment in metagenomics may not provide all of the answers. Problems can frequently occur including- short reads, error-prone data arising from different taxa, the issue of finding a taxa which may be deceased and ingested by another, overcoming GC bias and so on. With improving sequencing technologies, the drop in cost/base and with new sample preparation methods emerging it will not be long before we can strike a good balance between available material, sample prep strategies and sequencing platforms so we could get meaningful and interpretable results.
Exploring the frontiers
It may take multiple experimental designs and several rounds of sequencing to produce reliable data. Nevertheless, metagenomics allows us to explore the frontiers of previously unknown microbial communities!