Quantcast

The Advanced User’s Guide to Sequencing Alignment Software (Members Only Article)

Whether you’re employing sequencing gels, Sanger-based methods, or the latest in pyrosequencing or ion torrent technologies, obtaining, manipulating and analyzing your sequences has never been easier. Depending on what your goals are, you need to understand the pros and cons of the software. There is a lot of software out there, so do you your due diligence. Many are offered for free, but some aren’t worth the time. Caveat emptor! Here, we’ll talk about the differences between different sequencing alignment packages, and how to choose the one that’s right for you.

Alignment algorithms

Alignment algorithms, what are they? In the vast majority of cases, 3 or more sequences are being aligned (as opposed to “pairwise”), so we’ll be examining algorithms in this context. Generally, algorithms are simply computational commands that allow the software to recognize areas of similarity which may be associated with specific features that have been more highly conserved than other regions. They take into a account gaps, similarities and differences between 3 or more sequences. The most commonly used algorithms are:

a)  The Clustal family: ClustalW, ClustalW2, Clustal V, ClustalX and ClustalOmega

b) MUSCLE (MUltiple Sequence Comparison by Log-Expectation)

MEGA, and most others, offer both algorithms. ClustalW has been the workhorse for many applications and will most likely be the best fit for your work. ClustalOmega is the “latest and greatest”; however, if comparing large sets of sequences, MUSCLE will do the job nicely. Many have reported that MUSCLE tends to produce more reliable alignments, regardless of number, in less time so you may want to try them both with your data sets. Click the links to learn more about Clustal and MUSCLE.

Sequence data output formats

Don’t worry about differing formats- most software suites will convert to the required ASCII text format. Click here to learn about different sequence outputs.

User interface/web-based platforms

I’m a big fan of the free “MEGA” software (Molecular Evolution Genetic Analysis) and have used it for years. It offers more tools, support, a great GUI (graphic user interface) and numerous “how to” tutorials to get rookies and advanced users up and running quickly. Not all software suites offer integrated web-based sequence capture, e.g. from NCBI’s GenBank. For many, having this capability is a huge advantage which will save you time from constantly having to download and integrate web-based FASTA files from online sequence databases.

When using web-based servers like GenBank, remember you’re competing for resources with many others, so be prepared to wait. These are ok, but have serious limitations when it comes to manipulating your data in any way. If all you want to do is align 10-20 sequences, sure, give it a lash. But, if you’re going to try to align 100 Drosophila genomes, you will most likely crash the server…and make a lot of new friends. You’ve been warned.

Different software options

Here are a few of the programs that I’ve used over the years, and where to find them:

Designer

$$$

Website

MEGA

Free Download

http://www.megasoftware.net/
Lasergene

Pay Licensed

http://www.dnastar.com/t-products-lasergene.aspx
BioNumerics

Pay Licensed

http://www.applied-maths.com/
Bioedit

Free Download

http://www.mbio.ncsu.edu/bioedit/bioedit.html
NCBI/GenBank

Free web-based

http://blast.ncbi.nlm.nih.gov/
EMBL-EBI

Free web-based

http://www.ebi.ac.uk/Tools/msa/clustalw2/
TCoffee

Free web-based

http://www.ebi.ac.uk/Tools/msa/tcoffee/help/
PRALINE

Free web-based

http://www.ibi.vu.nl/programs/pralinewww/

What’s your favorite sequencing alignment software?

3 Comments

  1. Woellhaf on February 4, 2013 at 12:40 pm

    Hi,
    nice article. I just want to add that you can also align with MAFFT.
    It does its job very fast and accurate.
    http://mafft.cbrc.jp/alignment/server/

  2. Alex Kanno on October 17, 2012 at 6:31 pm

    Usually I work with clustalw and can’t say the difference to the others.. What should be said if someone asks “why did you use clustalw and not e.g. Tcoffee”? i wonder if someone can pinpoint the advantages and dis of the softwares above.. cheers!

    • NULabMonkey on October 19, 2012 at 1:41 pm

      Other than citing similar publications who have also used your algorithm of choice, it can actually be quite difficult to justify the use of one particular algorithm over another. The ClustalW paper (Thompson et al. 1994) is among the most cited BIOLOGY papers due to the fact that it was one of the first really solid MSA algorithms and was consequently included in just about every software package ever. However, the preference of the scientific community for oft-cited tools and techniques can contribute towards a feed-forward cycle which might stymie further progress by discouraging the use of newer, possibly better alignment algorithms.

      If you’re interested in comparisons of different MSA algorithms, then I’d suggest finding some papers which cite Balibase or Prefab, both of which are benchmark datasets. Since ClustalW came out in 1994, a lot of alternative methodologies have been brought to light, and many are definitely worth considering. If you’re looking for a quick comparison, off the top of my head I’d recommend the original M-Coffee paper (http://nar.oxfordjournals.org/content/34/6/1692.long)due to the fact that it’s completely open access and includes some pretty interesting data which compares different MSAs across multiple benchmarks.

      Generally speaking, if you “can’t say the difference to others,” it might mean that you need to familiarize yourself a little more with at least the surface landscape of bioinformatics as it relates to sequence alignments. Many investigators use these tools without understanding the theories upon which they’re grounded, and although it’s easy to get confused and bogged down in comparing different algorithms, it’s at least worth exploring a little bit.

      To summarize, feel free to justify your use of ClustalW by saying “because it’s worked really well for many other people,” but without reading more into the subject, it’s difficult to offer further explanation.

Leave a Comment





This site uses Akismet to reduce spam. Learn how your comment data is processed.