Skip to content

How to Odentify Protein Motifs from Protein Sequences

Wouldn’t it be great to put your nucleotide sequence into a program and get back a 3D-structure of your protein and a full description of its functions?

In theory, because the protein 3D-structure is determined by the aminoacid sequence, given the right algorithm and a powerful enough computer, this should be simple.  In practice, because the evolution of proteins has pushed different starting sequences into convergent folds, this task remains the Holy Grail of the proteomics computational biology, just as room temperature superconductivity remains an unattainable goal of material physics researchers. In the meanwhile a “wet biologist” has access to a number of halfway solutions – protein sequence motifs and structural motifs prediction, which are based on analysing common features of diverse proteins with similar function.

Primary sequence motifs

A protein sequence motif is an amino-acid sequence pattern found in similar proteins; change of a motif changes the corresponding biological function. One of the first sequence motifs reported were so-called Walker motifs, which later were shown to correspond to ATP- or GTP- binding and therefore are characteristic to a very broad range of proteins.  For example, Walker motif  A has the pattern GXXXXGK(T/S), where G, K, T and S are glycine, lysine, threonine and serine residues,  X – any other amino acid.

There are a number of websites that allow you to analyse your protein sequence motifs, for example:

ExPASy Proteomics Tools  – a collection of various proteomics tools, including

Prosite  – contains links to several programs, which allow finding the primary sequence motifs. I recommend not ticking “exclude patterns with a high probability of occurrence” option; this will show you some potential post-translational modification sites such as glycosylation and phophorylation in your protein.

Protein domain prediction

Protein domains are arrangements of secondary structure elements, which confer a biological function. The complex proteins have evolved by a mix-and-match assembly of individual domains or by concatenating several units of the same domain together. Domains have a similar function in different organisms and the protein domains organisation leads to hints about the protein function. One of the wide-spread motifs is a “helix-turn-helix”, which hints that your protein is able to bind DNA in some capacity.

Examples of programs predicting specific domains:

PSIPRED – protein sequence analysis workbench including secondary structure and disordered protein prediction;

Phobius  – transmembrane helical segments and signal sequences;

COILS – prediction of coiled-coil regions, characteristic for structural proteins or proteins involved in transcription regulation;

Case study

Yeast S.cerevisiae translation termination fact eRF3 is a cytosolic protein, which uses GTP to promote release of polypeptide chain from the ribosome. The crystal structure of the full-length eRF3 is not determined yet.

Prosite predicts several phophorylation and glycosylation sites, as well as a GTP-binding motif and that the protein is related to the elongation factors – which are close enough to eRF3 function. There is also predicted phophorylation and N-glycosylation sited, which I don’t remember anybody writing about.

What are your favourite motifs prediction tools and why?


Bioinformatics: Methods Express. Edited by Paul H. Dear,  Scion,  2007


  1. Boek on October 10, 2013 at 10:55 am

    For any motif/conservation work I actually find ELM one of the better resources

  2. Kurt Lager on October 8, 2013 at 4:40 am

    Interproscan have the advantage, that it runs so many different algorithms at the same time, you will find both all domains and the most basic motifs.

Leave a Comment

You must be logged in to post a comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll To Top
Share via
Copy link