Your DNA sequence can be put to good use fairly easily with Blast and Mega software. These programs can help in phylogenetic tree construction. You can ask questions like what is the evolutionary relationship between a set of sequences from different species? Or how have certain microbial strains arisen?
As any bioscientist probably knows, your first step with a new sequence would be to use BLAST, the Basic Local Alignment Search Tool. This nifty yet powerful resource matches your sequence to the millions of sequences stored in genomic and nucleotide databases. The tool comes up with the sequences most similar to yours. It also gives insights as to the possible identity of those sequences. The results include homologues across species and in similar tissues. Blast is important as it helps to confirm that sequences are homologues and not just lucky alignments.
The basics of using BLAST for nucleotide sequence searches has already been covered in this wonderful article. Below is a brief introduction to the relevant flavors of BLAST found on the NCBI site:
- BLASTN: Compares your nucleotide sequence to the nucleotide sequences in GenBank, NCBI’s repository for nucleotide sequences.
- BLASTX: Compares the six different translation frames (open reading frames) of your nucleotide sequence to the amino acid sequences in NCBI’s Protein Database. This is a great way to find out the possible products and functions of your sequence!
- MegaBLAST: Compares your sequence against other nucleotide sequences, optimal for finding very similar sequences of putatively related species. It casts a tighter net.
Multiple Sequence Alignment (MSA)
Multiple homologues detected via Blast can be aligned using algorithms such as ClustalW or MUSCLE. I like to use MEGA (Molecular Evolutionary Genetics Analysis) because it contains these and other functions. As such it is a one stop source for phylogenetic tree construction.
To start aligning your sequences, launch the Alignment Explorer by selecting the Align | Edit/Build Alignment. This is located on the launch bar of the main MEGA window. From the Alignment Explorer main menu, go to Web-> Query GenBank. This lets you add one by one the sequences for your alignment into the visual explorer. After adding all the sequences, you the option to align them using one to two different programs that are commonly used. You can use ClustalW or MUSCLE software.
Which One to Choose, ClustalW or MUSCLE?
The two alignment programs differ in their operation. ClustalW uses a progressive algorithm for alignment. It aligns two sequences at each step, then aligns the alignment with another sequence, and so on. MUSCLE stands for MUltiple Sequence Comparison by Log-Expectation. It achieves better results than ClustalW across key parameters. These parameters include alignment accuracy as well as lower time and space complexity using progressive, rather than an iterative, alignment.
Go to Alignment, and choose Align by Muscle. As a beginning user, the presets are fine to use, as they serve the purpose of most people. Your output should look something like that shown below.
Save your alignment as a .meg file. This way, you can use it later without having to spend time adding and aligning sequences again.
Phylogenetic Tree Construction with MEGA Version 6
Now comes the fun part! MEGA has a variety of options for phylogenetic tree construction, including UPGMA tree, Maximum Parsimony, Neighbor-Joining, and Maximum Likelihood. These are various approaches to tree construction, each with their own pros and cons, and suitability for your particular purpose. For a given method chosen, Mega will help you find the best model for your DNA or protein sequence substitution rates.
To construct a phylogenetic tree, close the alignment explorer and go back to the Main MEGA Window. We’re going to be constructing a Neighbor-Joining Tree for a quick look at our sequences and their relation to each other. You can always go back and redraw the tree using other methods!
Choose Phylogeny- Construct/Test Neighbor-Joining Tree, and choose your saved .meg file from the Alignment Explorer in the opened dialog box. After choosing and clicking Compute, you get a Tree that looks something like this:
To make that a little easier to read (shown below), click on the button above (Display Only Topology).
This tree gives us a lot of information about the sequence! You say, “like what?” It’s now evident that the Zaire Ebolavirus sequence from Gueckedou in Guinea, is most similar to the Mayinga strain (sequence AF272001.1). Both of these are most similar to the strain from Gabon, similar to the ones from Tai Forest or Sudan. This is a surprising fact, considering their geographical locations in Africa. Guinea is in north Africa and Gabon lies across the Gulf of Guinea. This suggest that bats may be an important transmitter of the ebolavirus between these locations.
BLAST and MEGA will help you get a start on analyzing your genome and making sense of the sequence data. This has been a very brief introduction to the power of MEGA. Note, the reliability of the tree can be estimated using the bootstrap method. Happy exploring!
1. Stecher, G., Liu, L., Sanderford, M., Peterson, D., Tamura, K., & Kumar, S. MEGA-MD: molecular evolutionary genetics analysis software with mutational diagnosis of amino acid variation. Bioinformatics 30, no. 9 (2014): doi:10.1093/bioinformatics/btu018.