Alignment algorithms
Alignment algorithms, what are they? In the vast majority of cases, 3 or more sequences are being aligned (as opposed to “pairwise”), so we’ll be examining algorithms in this context. Generally, algorithms are simply computational commands that allow the software to recognize areas of similarity which may be associated with specific features that have been more highly conserved than other regions. They take into a account gaps, similarities and differences between 3 or more sequences. The most commonly used algorithms are: a) The Clustal family: ClustalW, ClustalW2, Clustal V, ClustalX and ClustalOmega b) MUSCLE (MUltiple Sequence Comparison by Log-Expectation) MEGA, and most others, offer both algorithms. ClustalW has been the workhorse for many applications and will most likely be the best fit for your work. ClustalOmega is the “latest and greatest”; however, if comparing large sets of sequences, MUSCLE will do the job nicely. Many have reported that MUSCLE tends to produce more reliable alignments, regardless of number, in less time so you may want to try them both with your data sets. Click the links to learn more about Clustal and MUSCLE.Sequence data output formats
Don’t worry about differing formats- most software suites will convert to the required ASCII text format. Click here to learn about different sequence outputs.User interface/web-based platforms
I’m a big fan of the free “MEGA” software (Molecular Evolution Genetic Analysis) and have used it for years. It offers more tools, support, a great GUI (graphic user interface) and numerous “how to” tutorials to get rookies and advanced users up and running quickly. Not all software suites offer integrated web-based sequence capture, e.g. from NCBI’s GenBank. For many, having this capability is a huge advantage which will save you time from constantly having to download and integrate web-based FASTA files from online sequence databases. When using web-based servers like GenBank, remember you’re competing for resources with many others, so be prepared to wait. These are ok, but have serious limitations when it comes to manipulating your data in any way. If all you want to do is align 10-20 sequences, sure, give it a lash. But, if you’re going to try to align 100 Drosophila genomes, you will most likely crash the server…and make a lot of new friends. You’ve been warned.Different software options
Here are a few of the programs that I’ve used over the years, and where to find them:
Designer |
$$$ |
Website |
MEGA |
Free Download |
https://www.megasoftware.net/ |
Lasergene |
Pay Licensed |
https://www.dnastar.com/t-products-lasergene.aspx |
BioNumerics |
Pay Licensed |
https://www.applied-maths.com/ |
Bioedit |
Free Download |
https://www.mbio.ncsu.edu/bioedit/bioedit.html |
NCBI/GenBank |
Free web-based |
https://blast.ncbi.nlm.nih.gov/ |
EMBL-EBI |
Free web-based |
https://www.ebi.ac.uk/Tools/msa/clustalw2/ |
TCoffee |
Free web-based |
https://www.ebi.ac.uk/Tools/msa/tcoffee/help/ |
PRALINE |
Free web-based |
https://www.ibi.vu.nl/programs/pralinewww/ |