How to Identify Protein Motifs from Protein Sequences

Wouldn’t it be great to put your nucleotide sequence into a program and get back a 3D-structure of your protein and a full description of its functions?

In theory, because the protein 3D-structure is determined by the aminoacid sequence, given the right algorithm and a powerful enough computer, this should be simple. In practice, because the evolution of proteins has pushed different starting sequences into convergent folds, this task remains the Holy Grail of the proteomics computational biology, just as room temperature superconductivity remains an unattainable goal of material physics researchers. In the meanwhile a “wet biologist” has access to a number of halfway solutions – protein sequence motifs and structural motifs prediction, which are based on analysing common features of diverse proteins with similar function.

Primary sequence motifs

A protein sequence motif is an amino-acid sequence pattern found in similar proteins; change of a motif changes the corresponding biological function. One of the first sequence motifs reported were so-called Walker motifs, which later were shown to correspond to ATP- or GTP- binding and therefore are characteristic to a very broad range of proteins. For example, Walker motif A has the pattern GXXXXGK(T/S), where G, K, T and S are glycine, lysine, threonine and serine residues, X – any other amino acid.

There are a number of websites that allow you to analyse your protein sequence motifs, for example:

Image Larger Volumes with the UltraMicroscope Choros™

From: Miltenyi Biotech

Trust Your Quantification with the DeNovix DS-8X Rapid Eight Channel, 1µL UV-Vis Spectrophotometer

From: DeNovix

ExPASy Proteomics Tools – a collection of various proteomics tools, including

Prosite – contains links to several programs, which allow finding the primary sequence motifs. I recommend not ticking “exclude patterns with a high probability of occurrence” option; this will show you some potential post-translational modification sites such as glycosylation and phophorylation in your protein.

Protein domain prediction

Protein domains are arrangements of secondary structure elements, which confer a biological function. The complex proteins have evolved by a mix-and-match assembly of individual domains or by concatenating several units of the same domain together. Domains have a similar function in different organisms and the protein domains organisation leads to hints about the protein function. One of the wide-spread motifs is a “helix-turn-helix”, which hints that your protein is able to bind DNA in some capacity.

Examples of programs predicting specific domains:

PSIPRED – protein sequence analysis workbench including secondary structure and disordered protein prediction;

Phobius – transmembrane helical segments and signal sequences;

Case study

Yeast S.cerevisiae translation termination fact eRF3 is a cytosolic protein, which uses GTP to promote release of polypeptide chain from the ribosome. The crystal structure of the full-length eRF3 is not determined yet.

Prosite predicts several phophorylation and glycosylation sites, as well as a GTP-binding motif and that the protein is related to the elongation factors – which are close enough to eRF3 function. There is also predicted phophorylation and N-glycosylation sited, which I don’t remember anybody writing about.

What are your favourite motifs prediction tools and why?