RNA secondary structures in a polymer-zeta model how foldings should be shaped for sparsification to establish a linear speedup.

*(English)*Zbl 1360.92083Summary: Various tools used to predict the secondary structure for a given RNA sequence are based on dynamic programming used to compute a conformation of minimum free energy. For structures without pseudoknots, a worst-case runtime proportional to \(n^3\), with \(n\) being the length of the sequence, results since a table of dimension \(n^2\) has to be filled in while a single entry gives rise to a linear computational effort. However, it was recently observed that reformulating the corresponding dynamic programming recursion together with the bookkeeping of potential folding alternatives (a technique called sparsification) may reduce the runtime to \(n^2\) on average, assuming that nucleotides of distance \(d\) form a hydrogen bond (i.e. are paired) with probability \(\frac{b}{d^c}\) for some constants \(b>0\), \(c>1\). The latter is called the polymer-zeta model and plays a crucial role in speeding up the above mentioned algorithm. In this paper we discuss the application of the polymer-zeta property
for the analysis of sparsification, showing that it must be applied conditionally on first and last positions to pair. Afterwards, we will investigate the combinatorics of RNA secondary structures assuming that the corresponding conditional probabilities behave according to a polymer-zeta probability model. We show that even if some of the structural parameters exhibit an almost realistic behavior on average, the expected shape of a folding in that model must be assumed to highly differ from those observed in nature. More precisely, we prove our polymer-zeta model to be appropriate for mRNA molecules but to fail in connection with almost every other family of RNA. Those findings explain the huge speedup of the dynamic programming algorithm observed empirically by Wexler et al. when applying sparsification in connection with mRNA data.

##### MSC:

92D20 | Protein sequences, DNA sequences |

90C39 | Dynamic programming |

90C90 | Applications of mathematical programming |

##### Keywords:

secondary structure; RNA sequence; minimum free energy; dynamic programming recursion; sparsification
PDF
BibTeX
XML
Cite

\textit{E. Y. Jin} and \textit{M. E. Nebel}, J. Math. Biol. 72, No. 3, 527--571 (2016; Zbl 1360.92083)

Full Text:
DOI

##### References:

[1] | Amman, F; Bernhart, SH; Doose, G; Hofacker, I; Qin, J; Stadler, PF; Will, S, The trouble with long-range base pairs in RNA folding, Adv Bioinf Comput Biol LNCS, 8213, 1-11, (2013) |

[2] | Backofen R, Tsur D, Zakov S, Ziv-Ukelson M (2011) Sparse RNA folding: time and space efficient algorithms. J Discret Algorithm 9(1):12-31 · Zbl 1216.92033 |

[3] | Clote, P; Kranakis, E; Krizanc, D, Asymptotic structural properties of quasi-random saturated structures of RNA, Algorithms Mol Biol, 8, 24, (2013) |

[4] | Dimitrieva, S; Bucher, P, Practicality and time complexity of a sparsified RNA folding algorithm, J Bioinfo Comput Bio, 10, 1241007, (2012) |

[5] | Flajolet P, Sedgewick R (2010) Analytic combinatorics, Cambridge University Press, Cambridge. ISBN-13:9780521898065 · Zbl 1165.05001 |

[6] | Goodwin, E; Okkema, P; Evans, TC; etal., Translational regulation of tra-2 by its 30 untranslated region controls sexual identity in C. elegans, Cell, 75, 329-339, (1993) |

[7] | Hille, E, Ordinary differential equations in the complex domain, Pure Appl Math Wiley-Intersci Ser Texts Monogr Tracts, 60, 37-48, (2010) |

[8] | Hofacker, I, Vienna RNA secondary structure server, Nucleic Acid Res, 13, 3429-3431, (2003) |

[9] | Hofacker IL (2004) RNA secondary structure analysis using the Vienna RNA package. Curr Protoc Bioinformatics. Chapter 12: Unit12.2. doi:10.1002/0471250953.bi1202s04 |

[10] | Huang, FWD; Reidys, CM, On the combinatorics of sparsification, Algorithms Mol Biol, 7, 28, (2012) |

[11] | Ince EL (1956) Ordinary differential equations, pure and applied mathematics. Dover Publications Inc, New York |

[12] | Möhl, M; Salari, R; Will, S; Backofen, R; Sahinalp, S, Sparsification of RNA structure prediction including pseudoknots, Algorithms Mol Bio, 5, 39, (2010) |

[13] | Nebel, ME, Combinatorial properties of RNA secondary structures, J Comput Biol, 9, 541-573, (2003) |

[14] | Nebel, ME, Investigation of the Bernoulli model for RNA secondary structures, Bull Math Biol, 66, 925-964, (2004) · Zbl 1334.92314 |

[15] | Nebel ME (2004) Identifying good predictions of RNA secondary structure proceedings of the pacific symposium on biocomputing 2004, pp 423-434 · Zbl 1302.92103 |

[16] | Sprinzl M, Vassilenko KS, Emmerich J, Bauer F (1999) Compilation of tRNA sequences and sequences of tRNA genes, 20 Dec 1999. http://www.uni-bayreuth.de/departments/biochemie/trna/ |

[17] | Wexler, Y; Zilberstein, C; Ziv-Ukelson, M, A study of accessible motifs and RNA folding complexity, J Comput Biol, 14, 856-872, (2007) · Zbl 1302.92103 |

[18] | Wuyts, J; Rijk, P; Peer, Y; Winkelmans, T; Wachter, R, The European large subunit ribosomal RNA database, Nucleic Acids Res, 29, 175-177, (2001) |

[19] | Zuker, M; Sankoff, D, RNA secondary structures and their prediction, Bull Math Biol, 46, 591-621, (1984) · Zbl 0548.92007 |

[20] | Zuker, M, Computer prediction of RNA structure, Methods Enzymol., 180, 262-288, (1989) · Zbl 0681.92011 |

[21] | Zuker, M, Mfold web server for nucleic acid folding and hybridization prediction, Nucleic Acid Res, 13, 3406-3415, (2003) |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.