Classification of protein foldings

The idea behind this project is to classify protein foldings by using ideas and methods from knot theory.

It has already been demonstrated that most proteins are unknotted or at most simple knots. These results do not, however, place any other restriction on the untying of the know other than that the protein string may not be broken or cross.

The general idea is to place further restrictions on what deformations may be applied to the knot, hoping that this richer structure will provide us with a finer classification on purely topological grounds.

Present approaches

The present approaches for protein fold classification may basically be divided into two categories: a geometrical approach and a topological approach.

Geometrical approach

In the geometrical approach, a distance between different proteins is calculated. Basically, this amounts to taking two different proteins, and try to superposition them to see if they look the same; and then use for example the room mean square distance (RMSD) between the atoms to measure how well they fit.

The idea is that if the distance between two proteins is sufficiently small, they have the same folding. Furthermore, if one can find a sequence of proteins where consecutive proteins are sufficiently close to each other, one may conclude that they all have similar foldings, even those that are not geometrically close to each other.

Topological approach

The typical topological approach is one that extends the secondary structure classification by identifying bondings: e.g. beta sheets, barrels, and various motifs.

Basically, these structures describe the topology of the protein in itself, but not how the protein is embedded in space: i.e. the position of non-bonded sub-structures relative to each other.

Idea to new approach: homotopy

In mathematics, homotopy denotes how one object is embedded into another: two embeddings are homotopic if one may be continuously deformed into the other. This differs from the geometric approach in the sense that one does not restrict ones attention to known protein structures; it differs from the topology approach in that it is the embedding of the protein that is studied rather than just the internal bondings between secondary structures.

Some work has already been done on this. It has been demonstrated that if one holds onto the ends of a protein, most proteins can be continuously deformed into an unknotted string: i.e. that most proteins are unknotted. In some cases the result is a knot, but mostly a very simple knot. This result may not be of great interest for protein classification, but it does tell us something about what restrictions exist for protein structures.

The hope is that, by adding restrictions on the continuous deformations, preferably restrictions with a natural biological motivation, a similar approach can be used to differ between different protein foldings while still identifying similar foldings even when the geometrical distance is large.

Present status

A first look at the problem indicated that, unless strong restrictions are made on the unfolding, folds with the same topology will tend to be homotopic. This is particularly true for the simpler proteins.

I have put the project aside for now to give priority to other projects as it would most likely require quite a bit of work and it is most unclear what results might come; though I may pick it up again later.

Last modified June 21, 2007.