1. An orthodox secondary structure

Our research group has put
a special focus on *ncRNA*: functional non-coding RNA. These are single
stranded RNA that are known to typically fold, binding to itself to
form double stranded helices. As with proteins, the structure is
described at the primary level as a sequence of nucleotides, at the
secondary level by the topology formed by the internal bonds of
regions of the RNA into stacked helices, and at the tertiary level as
a 3-dimensional molecule.

Many of the ncRNA form secondary structures of a particularly simple kind: orthodox secondary structures where the RNA folds into a tree-like structure. Even for those ncRNA that are not orthodox, the major part of the structure is of this orthodox kind with only a few exceptional bonds.

2. An RNA H-pseudoknot

The secondary structures that are not orthodox contain pseudoknots. However, as already pointed out, there are typically only a few bonds that are knotted together: most of the bonds are of the orthodox kind. This calls for a decomposition of the secondary structure into knot-components where the orthodox secondary structures contains only one kind of knot-components, the orthodox hairpin-style bond, whereas knotted bonds make more knot-components.

Note that we referee to the knotted structures, like the second figure, as pseudoknots, not knots. The reason is that the structure does not truly form a knot as this would require threading the end of the RNA strand through one of the loops at one point.

3. RNA with pseudoknot

4. Ladders as semi-circles

5. Tree of knot-components

Decomposition of pseudoknotted secondary structures into components has generally not been explained explicitly, although a number of authors have already used it implicitly. In particular, the word pseudoknot is used both for the entire non-orthodox secondary structure, and for the subset of bonds that actually form the knot, though it is generally quite clear from the context which is meant.

One common and natural definition is that the pseudoknot contains
the minimal sets of knotted ladders. In the semi-circle representation
of ladders, this means that semi-circles that cross are grouped
together. This causes the ladders to be partitioned in a unique way
into what I call the *knot-components*. In an
orthodox structure, all knot-components are just a single ladder;
otherwise, there is at least one knot-component with knotted ladders,
and these are the ones I call *pseudoknots*.

The orthodox secondary structure has a natural tree-representation as two ladders are either nested (one semi-circle inside the other) or ordered (one before the other). This is useful both to count the number of secondary structures, and is implicitly used to make efficient algorithms for predicting secondary structure of ncRNA. A similar hierarchical tree-representation for general pseudoknotted structures exists where each node represents a knot-component. This representation can be used to count the number of secondary structures with only specific kinds or numbers of pseudoknots.

Figures 3-5 give the same secondary structure as a plannar representation, semi-circle representation of the ladders, and tree representation of knot-components.

6. Experimentally confirmed pseudoknots (collapsed)

Having defined what is considered a pseudoknot, it is now easy to
list the different kinds of pseudoknots that may exist with any given
set of ladders. The simplest kind of pseudoknot is the H-pseudoknot of
figure 2. However, the pseudoknot which is part of the structures in
figures 3-5 is also an H-pseudoknot, although with one of the ladders
split into two ladders by a loop. If we collapse this component,
i.e. remove loops and bulges so that subsequent ladders are collapsed
into one, we get *collapsed pseudoknots*. Pseudoknots
verified to exist are illustrated in their collapsed form in figure
6.

There has been quite a bit of focus on pseudoknots in ncRNA, though most algorithms in use do not handle them, and those who do tend to be computationally demanding. There are also problems related to calculating the free energy of pseudoknotted ncRNA structures. On the other hand, there are a number of cases where pseudoknots exist and have an important role.

One problem for ncRNA secondary structure prediction is that the number of possible structures increases very rapidly as the size of the structure increases. By allowing pseudoknots, this problem gets much worse. In no restrictions are made and pseudoknots not penalized, random secondary structures will tend to become dominated by typically one big pseudoknot. Even when making strong restrictions on the types of pseudoknots allowed, the nubmer of available structures increases dramatically.

One approach to penalize pseudoknots is by adding terms to the free energy. I suggest another approach by simply comparing random secondary structures to actual secondary structures (from Rfam and PseudoBase), and estimate the penalties necessary to get realistic pseudoknot frequencies.

Last modified
October 28, 2013.