Until recently, vast areas of the genome had been denounced as “junk” DNA, because they do not encode proteins. However, it has become clear that these regions have a large diversity of other functions, from transcriptional and translational regulation to the protection of genes and genome integrity. The ENCODE project reported in 2012 that at least 78% of the genomic sequence (in humans) serve a specific function. Most of the functions are yet unknown, and there is strong interest in developing algorithms that help uncover the logical patterns within non-coding sequences. In this article, we’ll discuss a few different software options that you can use to identify conserved non-coding elements.
Non-coding sequence alignment using MULAN
When aligning protein or mRNA sequences the software usually matches sequences by conservation, since these sequences are assumed to share common origin. However a characteristic of non-coding DNA is that functional elements can rearrange (change position, break up, invert) without losing their functionality, which makes them impossible to align with the same software. The free online software MULAN (MUltiple sequence Local AligNment and visualization tool) uses genes and surrounding regions to look for conservation in the non-coding DNA. The user has to provide sequence data from several species (depending on the depth of conservation you are looking for) for the same gene; for example, a gene plus 5kb of upstream sequence. Additionally, all exons in the area have to be annotated, because naturally they will show up as highly conserved areas. The alignment is performed pairwise, comparing each species with the species that was selected as a reference (see Figure 1: Screenshot of MULAN Output). The software can find patches of conservation that are in a different order or backwards, as is often the case with enhancer elements. Their position on the reference sequence is highlighted and the sequence alignment can be viewed and analyzed.
Figure 1: Screenshot of MULAN output. Here, the 10kb upstream region of a gene was compared between teleost (a large and extremely diverse group of ray-finned fish) and humans, using zebrafish as a reference genome. The red areas show high conservation; the closer related the species are the more non-coding elements can be expected to be conserved. In this example, one upstream and one intronic element are highly conserved from fish to human; a regulatory function of these elements is highly likely. (Tetraodon (Tetraodontidae) and fugu belong to the pufferfish genus; medaka (Oryzias latipes) belongs to the ricefish genus.)
Uncovering synteny using Genomicus
Another way to approach conservation is taking synteny into account. Genes are said to be in synteny if the same genes occur in close proximity to one another across several species. A common feature of syntenic loci is that they also share regulatory elements (the most famous example of this is the hox gene clusters). An online genome browser that searches for syntenic genes is Genomicus. The user selects gene and species and the software calculates a phylogenetic tree based on this gene and shows the surrounding genes if they are in synteny. By ticking “CNE” (conserved non-coding elements) in the view menu, the software will also show areas of non-coding conservation between the syntenic genes.
Exploring synteny and conservation with UCNE base
Finally, conservation and synteny information can also be found conveniently presented in a browse-able database at UCNE base, “a database of ultraconserved non-coding elements and genomic regulatory blocks”. It includes many model species and a lot of data, so knowing exactly what you are looking for is mandatory in this case.
Any questions? Let us know in the comments section!
Check out our related article on identifying conserved elements in genes.
References:
Dimitrieva, S., & Bucher, P. (2012). UCNEbase–a database of ultraconserved non-coding elements and genomic regulatory blocks. Nucleic acids research, 1–9. doi:10.1093/nar/gks1092
Kikuta, H., Laplante, M., Navratilova, P., Komisarczuk, A. Z., Engström, P. G., Fredman, D., Akalin, A., et al. (2007). Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome research, 17(5), 545–55. doi:10.1101/gr.6086307
Whole genome sequencing (WGS) is becoming increasingly common. Doctors now routinely order it for patients with puzzling diseases. The NHS (National Health Service in the UK) has declared that it will sequence 100,000 genomes over the next few years. Increase WGS…increase ethical questions The direct-to-consumer company 23andme has been experimenting with whole exome sequencing (WES), and another company, DNA…
The efficiency of whole genome sequencing (WGS) workflows has skyrocketed since its inception. Major leaps and minor tweaks in the WGS workflow have compounded over time resulting in radical reductions in processing time and the cost of sequencing whole genomes over the past decades. The complete sequencing of the first human genome, named the Human…
It’s the hot new technique. With a single procedure, you can get information about all RNA transcripts at once! It sounds like a dream. While RNA sequencing (RNA-seq) has opened the door to exciting new questions, scientists interested in pursuing this technique should be aware of the roadblocks ahead of them. While RNA-seq can be…
So, the genome-wide association study (GWAS) data for your disease of interest was published, and it has thrown up some very interesting associations. However, at this stage, bear in mind that this is only an association. Your project is to provide the link between the GWAS single nucleotide polymorphisms (SNP) and pathological changes. Where do…
Maybe you want to examine the entire transcriptome or maybe you want to investigate changes in expression from your favorite gene. You could do whole transcriptome sequencing or mRNA-seq. But which one is right for your project? From budget considerations to sample collection, let’s briefly look at both to see which might be best for your…
10 Things Every Molecular Biologist Should Know
The eBook with top tips from our Researcher community.