Predicting how proteins will fold in vivo is a Holy Grail of proteomics and theoretical chemistry. Current hopes are that this can be achieved by designing an in silico platform that can predict protein folding, either de novo (a.k.a. from scratch) or using known proteins as a guide. What would we need to do, why would we want to, why is it so hard, and where are we with this now? Let’s delve into the world of predicting protein folding. Whether you’re a novice or hoping to learn more, this article is for you!

1. What do we mean by protein folding:

Proteins are composed of building blocks called amino acids. Some describe it as being like pearls on a string to make a necklace. However, this is only helpful when you’re thinking of the protein as a long, unfolded strand. In actuality, proteins exist in intricately folded and twisted arrays, often interacting with other proteins in a specific environment. Each amino acid has its own unique chemical properties, which results in a preference or disdain for certain other amino acids and aqueous environments.

2. Reasons we’d want to predict protein structure:

The Holy Grail is automated protein structure prediction. To be able to predict how a string of amino acids will fold, the program will need to know certain information including, but not limited to:

  • Amino acid sequence
  • Properties of each amino acid in the sequence
  • Properties of the environment in which the folding will occur
  • Whether the protein will interact with other proteins (called a quaternary structure)
  • Whether the sequence has similarities to other sequences for which the folding pattern is well understood.

3. How we try to predict protein folding:

The key for the program is to be able to identify likely patterns in the folding. The program will look at the primary structure of the protein and the extended chain of amino acids, and pick out features that suggest its likelihood to fold in a particular manner. For example, a ubiquitous folding pattern is the alpha (?) helix. For example, regions richer in alanine, glutamic acid, leucine, and methionine and lower in proline, glycine, tyrosine, and serine tend to form an alpha helix. Depending on where the helix will reside on the protein, it will have certain properties. So, helices exposed on the surface of a protein folding in a water-rich solution will have a higher proportion of hydrophilic amino acids than those that form with a protein’s covered core or on the surface of a protein that exists in a lipid-rich solution. By picking out all these features, the program will begin to work out the most energetically favorable way for the protein to fold.

4. Why it’s so hard to predict:

Firstly, two completely different amino acid chains from totally different sources and evolutionary backgrounds that share little sequence similarity may fold into very similar structures. So, sequence similarity may not tell the whole story for predicting protein structure.

Secondly, two proteins that share a statistically significant degree of sequence similarity likely evolved from a common ancestor. However, gene duplication and genetic rearrangements during evolution may give rise to new gene copies, which can then evolve into proteins with new function and structure. This means that, although the two protein sequences may share a similar sequence, they may fold very differently!

Thirdly, it takes extraordinarily powerful computers and highly experienced experts to be able to even attempt protein structure prediction because there are so many variables. So, the high cost is a hindrance in some cases.

Fourthly, there are so many unknowns. It can be hard to know enough about a protein, its particular microenvironment, and the in vivo folding process to predict its structure. The sheer number of variables and presumption on which prediction software is based is also an issue.

Right now, the most advanced software can predict protein folding with about 80% accuracy and weekly tables are available, such as LiveBench and EVA. Some labs have made their software open source to allow for a crowdsourcing approach—even allowing the “common man” to be involved—including [email protected], the Human Proteome Folding Project, Nutritious Rice for the World, and [email protected]. A publicly known project is called FoldIt, and is a very clever and unique online game that teaches you about protein folding as well as providing new solutions to scientists. Players can puzzle away real protein problems like targeting and eradicating diseases and creating biological innovations. A 2010 paper in the journal Nature credited Foldit’s 57,000 players with providing useful results that matched or outperformed algorithmically computed solutions.

5. What’s next?

Personally, I feel that everything goes in cycles. Science started out with the common man and s/he then became more and more educated and specialized until you have the scientists today who are experts in their own little niches. However, to be able to do something as complex as protein folding, you need some many experts. Experts get into the habit of thinking of things a certain way and as you get to the top of the food chain of scientists, people are so specialized they struggle to communicate with each other (generally speaking!). The common man has the ability to add a new perspective, a new way of thinking, and there are far more common men than expert scientists. I feel that huge progress can be made by using the “hive mind” of public involvement!

What do you think about using crowd-based methods to solve complex problems? Tell us more in the comments.


Feature image courtesy of katsuuu 44.