How Does BLAST Work?

More than a pun on the explosive growth of sequencing data, BLAST makes annotation and comparisons of similar sequences much easier. Created by a group at the U.S. National Center for Biotechnology Information in 1991, the Basic Local Alignment Search Tool is arguably the most heavily used tool for sequence analysis (that’s available for free, anyway).

BLAST is a powerful and popular tool because it can find similarities between experimental and reference sequences (or a whole series of sequences) very quickly and accurately. There are several different types of BLAST algorithms, accessing databases for help with identifying genomes (RNA and DNA nucleotide sequences), proteins, and targeted genomic sections like SNPs or specifically targeted regions.

The BLAST databases of sequences has been added to over the years (every query a scientist makes is stored in the database, creating an ever-growing reference). This growth has only added to the accuracy and helpfulness of this database. At the same time, NCBI has added computer power, and is now experimenting with Amazon Web Services to operate BLAST “in the cloud.”

What do you need to do to use BLAST?

Naturally, you’ll first need a computer and a sequence of something.

Choose a free resource to help you move forward

POSTER

Antibiotics are used in a wide range of techniques in molecular biology but which one is right for your application, how does it work, and at what concentration? Our downloadable wall chart aims to provide an easy reference to help you pick and use the right antibiotic for your research.

GET YOUR COPY

DIGITAL TOOL

Four ready-to-use tools to help you prep, analyze, troubleshoot, and report qPCR data more reliably. Includes an oligo prep helper, ΔΔCt calculator, troubleshooting reference card, and plain-English guide to 11 essential qPCR papers. Use it to catch common setup, calculation, and interpretation errors before they affect your results.

DOWNLOAD FREE

Going to the NCBI/BLAST website, you’ll see a number of options. Choose a species to search, or you can compare your sample against all the species in the database.

You’ll need to decide on a BLAST program:

To search nucleotides against nucleotides, select “blastn” or “megaBLAST” (this second category is considered the fastest).
To search proteins against proteins, select “blastp”
“Blastx” will search a protein database using your translated nucleotide query.
“tBlastn” will do the opposite of blastx, searching a translated nucleotide database with your protein query.
And “tBlastx” searches translated nucleotide databases with your translated nucleotide query.

There are a lot of specialized searches you can perform, too, including making primers, finding conserved domains only, looking at immunoglobulin sequences and structures, and search for possible vector contamination.

Once you’ve decided which BLAST program to use, it’s very easy and web-based; just copy and paste your sequence into the right area, and fill out a few other areas per the instructions (each program is a little different, but easy to follow).

A wealth of BLAST resources

The NCBI provides so much material to get you started, it’s almost overwhelming.

Tutorials, web-based instructions, videos, step-by-step programs can be found nearly anywhere on the BLAST site. One slightly annoying aspect of the NCBI BLAST pages, however, is the number of online courses that have been discontinued, but remain on the web sites. These same sites also contain new courses, but couldn’t an organization with a reputation for computerized prowess know how to take down a retired page?

Behind the scenes of BLAST

The NCBI estimates that about 200,000 “queries” (that’s your submission of a sequence) are made every week. However, depending on how many sequences you enter and how long those sequences are, you can get results back in a few minutes, possibly a handful of seconds.

BLAST works by detecting local alignments between sequences that work the best. The BLAST computers start with a small set of three letters, which they call the “query word.” These letters will represent three amino acids or nucleotides, in a specific order (for example, the nucleotides ATC, in that order). The BLAST search then looks for the number of times (and places along the sequence) in which this three-letter “word” appears. It will also look for closely related “words” in which one letter is different. Then, each query is scored to determine which database is “in the neighborhood” of your sample.

What results do you get?

When your BLAST search is finished, you’ll get a computerized “picture” of your results. Your “query” sequence will appear first. Below your query sequence, you’ll see a number of shorter lines, representing the reference sequences that were the most comparable to your query sequence. You’ll also get a percentage similarity estimate. Moving your mouse over the lines will show the identity of each “hit”. You’ll then be able to identify (one hopes) the species, gene or type of protein you’ve submitted for comparison.

What’s not to like?

BLAST does have a few shortcomings. Because the algorithms are making estimates of the best possible alignments, you may have errors pop up due to rare SNPs or an INDEL. There is a SNP BLAST search, however. In addition, if your query word “neighborhood” search includes too many three word combinations, you’ll end up with sequences that really aren’t as similar as you hoped.

However, NCBI is working on BLAST constantly, and it gets stronger with the number of scientists making queries.

You made it to the end—nice work! If you’re the kind of scientist who likes figuring things out without wasting half a day on trial and error, you’ll love our newsletter. Get 3 quick reads a week, packed with hard-won lab wisdom. Join FREE here.

Andrew Porterfield

Andrew has been a freelance life science writer for more than 20 years. Worked for academic institutions, startup biotechs, major biopharmaceuticals. Agriculture editor, Genetic Literacy Project. He has an MS in Biotechnology from the University of Maryland, and a BA in Physical Anthropology from the University of Pennsylvania.

About Us

Marketing

Bitesize Bio Search

How Does BLAST Work?

What do you need to do to use BLAST?

A wealth of BLAST resources

Behind the scenes of BLAST

What results do you get?

What’s not to like?

Harness the Power of BioEdit and Microsoft Excel for Quick BLAST Summaries

Improving Empower™ Efficiency Through Better User Training and Onboarding

Brushing Up On Your Excel Skills: Part One

How to Start Using Coding to Automate Image Analysis

My 10 Favorite R Packages and the Cool Things You Can Do with Them

Crafting Multi-panel Images Into Figures

See the Hidden at EMBL Imaging Centre: Fast and Gentle 3D Imaging Powered by Adv

**Get help with everything* lab-related.**

10 Things Every Molecular Biologist Should Know

Get practical lab wisdom like this in your inbox

About Us

Marketing

Bitesize Bio Search

How Does BLAST Work?

What do you need to do to use BLAST?

A wealth of BLAST resources

Behind the scenes of BLAST

What results do you get?

What’s not to like?

More 'Software and Online Tools' articles

See the Hidden at EMBL Imaging Centre: Fast and Gentle 3D Imaging Powered by Adv

Get help with everything* lab-related.

10 Things Every Molecular Biologist Should Know

Get practical lab wisdom like this in your inbox

**Get help with everything* lab-related.**