1. An orthodox secondary structure
Our research group has put a special focus on ncRNA: functional non-coding RNA. These are single stranded RNA that are known to typically fold, binding to itself to form double stranded helices. As with proteins, the structure is described at the primary level as a sequence of nucleotides, at the secondary level by the topology formed by the internal bonds of regions of the RNA into stacked helices, and at the tertiary level as a 3-dimensional molecule.
Many of the ncRNA form secondary structures of a particularly simple kind: orthodox secondary structures where the RNA folds into a tree-like structure. Even for those ncRNA that are not orthodox, the major part of the structure is of this orthodox kind with only a few exceptional bonds.
2. An RNA H-pseudoknot
The secondary structures that are not orthodox contain pseudoknots. However, as already pointed out, there are typically only a few bonds that are knotted together: most of the bonds are of the orthodox kind. This calls for a decomposition of the secondary structure into knot-components where the orthodox secondary structures contains only one kind of knot-components, the orthodox hairpin-style bond, whereas knotted bonds make more knot-components.
Note that we referee to the knotted structures, like the second figure, as pseudoknots, not knots. The reason is that the structure does not truly form a knot as this would require threading the end of the RNA strand through one of the loops at one point.
3. RNA with pseudoknot
4. Ladders as semi-circles
5. Tree of knot-components
Decomposition of pseudoknotted secondary structures into components has generally not been explained explicitly, although a number of authors have already used it implicitly. In particular, the word pseudoknot is used both for the entire non-orthodox secondary structure, and for the subset of bonds that actually form the knot, though it is generally quite clear from the context which is meant.
One common and natural definition is that the pseudoknot contains the minimal sets of knotted ladders. In the semi-circle representation of ladders, this means that semi-circles that cross are grouped together. This causes the ladders to be partitioned in a unique way into what I call the knot-components. In an orthodox structure, all knot-components are just a single ladder; otherwise, there is at least one knot-component with knotted ladders, and these are the ones I call pseudoknots.
The orthodox secondary structure has a natural tree-representation as two ladders are either nested (one semi-circle inside the other) or ordered (one before the other). This is useful both to count the number of secondary structures, and is implicitly used to make efficient algorithms for predicting secondary structure of ncRNA. A similar hierarchical tree-representation for general pseudoknotted structures exists where each node represents a knot-component. This representation can be used to count the number of secondary structures with only specific kinds or numbers of pseudoknots.
Figures 3-5 give the same secondary structure as a plannar representation, semi-circle representation of the ladders, and tree representation of knot-components.
6. Experimentally confirmed pseudoknots (collapsed)
Having defined what is considered a pseudoknot, it is now easy to list the different kinds of pseudoknots that may exist with any given set of ladders. The simplest kind of pseudoknot is the H-pseudoknot of figure 2. However, the pseudoknot which is part of the structures in figures 3-5 is also an H-pseudoknot, although with one of the ladders split into two ladders by a loop. If we collapse this component, i.e. remove loops and bulges so that subsequent ladders are collapsed into one, we get collapsed pseudoknots. Pseudoknots verified to exist are illustrated in their collapsed form in figure 6.
There has been quite a bit of focus on pseudoknots in ncRNA, though most algorithms in use do not handle them, and those who do tend to be computationally demanding. There are also problems related to calculating the free energy of pseudoknotted ncRNA structures. On the other hand, there are a number of cases where pseudoknots exist and have an important role.
One problem for ncRNA secondary structure prediction is that the number of possible structures increases very rapidly as the size of the structure increases. By allowing pseudoknots, this problem gets much worse. In no restrictions are made and pseudoknots not penalized, random secondary structures will tend to become dominated by typically one big pseudoknot. Even when making strong restrictions on the types of pseudoknots allowed, the nubmer of available structures increases dramatically.
One approach to penalize pseudoknots is by adding terms to the free energy. I suggest another approach by simply comparing random secondary structures to actual secondary structures (from Rfam and PseudoBase), and estimate the penalties necessary to get realistic pseudoknot frequencies.