Sparse RNA folding: time and space efficient algorithms.

*(English)*Zbl 1216.92033Summary: The currently fastest algorithm for RNA single strand folding requires \(O(nZ)\) time and \(\Theta (n^{2})\) space, where \(n\) denotes the length of the input string and \(Z\) is a sparsity parameter satisfying \(n \leqslant Z<n^{2}\). We show how to reduce the time and space complexities of this algorithm in the sparse case. The space reduction is based on the observation that some solutions for sub-instances are not examined after a certain stage of the algorithm, and may be discarded from memory. The running time speed up is achieved by combining two independent sparsification criteria, which restrict the number of expressions that need to be examined in bottleneck computations of the algorithm. This yields an \(O(n^{2}+PZ)\) time and \(\Theta (Z)\) space algorithm, where \(P\) is a sparsity parameter satisfying \(P<n \leqslant Z \leqslant n(P+1)\). For the base-pairing maximization variant, the time complexity is further reduced to \(O(LZ)\), where \(L\) denotes the maximum number of base-pairs in a folding of the input string and satisfies \(L \leqslant n \setminus 2\).

The presented techniques also extend to the related RNA simultaneous alignment and folding problem. For an input composed of two strings of lengths \(n\) and \(m\), the time and space complexities are reduced from \(O(nm\tilde Z)\) and \(\Theta (n^{2}m^{2})\) down to \(O(n^2m^2 + \tilde P \tilde Z)\) and \(\Theta (nm^2 + \tilde Z)\) respectively, where \(\tilde Z\) and \(\tilde P\) are sparsity parameters satisfying \(\tilde P < nm \leqslant \tilde Z < nm(\tilde P + 3)\).

The presented techniques also extend to the related RNA simultaneous alignment and folding problem. For an input composed of two strings of lengths \(n\) and \(m\), the time and space complexities are reduced from \(O(nm\tilde Z)\) and \(\Theta (n^{2}m^{2})\) down to \(O(n^2m^2 + \tilde P \tilde Z)\) and \(\Theta (nm^2 + \tilde Z)\) respectively, where \(\tilde Z\) and \(\tilde P\) are sparsity parameters satisfying \(\tilde P < nm \leqslant \tilde Z < nm(\tilde P + 3)\).

##### MSC:

92C40 | Biochemistry, molecular biology |

65Y20 | Complexity and performance of numerical algorithms |

68W40 | Analysis of algorithms |

PDF
BibTeX
XML
Cite

\textit{R. Backofen} et al., J. Discrete Algorithms 9, No. 1, 12--31 (2011; Zbl 1216.92033)

Full Text:
DOI

##### References:

[1] | Akutsu, Tatsuya, Approximation and exact algorithms for RNA secondary structure prediction and recognition of stochastic context-free languages, Journal of combinatorial optimization, 3, 321-336, (1999) · Zbl 0972.92012 |

[2] | Alkan, Can; Karakoç, Emre; Nadeau, Joseph H.; Sahinalp, Süleyman Cenk; Zhang, Kaizhong, RNA-RNA interaction prediction and antisense RNA target search, Journal of computational biology, 13, 2, 267-282, (2006) · Zbl 1119.92307 |

[3] | Andronescu, Mirela; Condon, Anne; Hoos, Holger H.; Mathews, David H.; Murphy, Kevin P., Efficient parameter estimation for RNA secondary structure prediction, Bioinformatics, 23, 13, i19-i28, (2007) |

[4] | Apostolico, Alberto; Atallah, Mikhail J.; Hambrusch, Susanne E., New clique and independent set algorithms for circle graphs, Discrete applied mathematics, 36, 1, 1-24, (March 1992) |

[5] | Backofen, Rolf; Tsur, Dekel; Zakov, Shay; Ziv-Ukelson, Michal, Sparse RNA folding: time and space efficient algorithms, (), 249-262 · Zbl 1247.68106 |

[6] | Baker, James K., Trainable grammars for speech recognition, The journal of the acoustical society of America, 65, S1, S132, (1979) |

[7] | Athanasius F. Bompfünewerer Consortium; Backofen, Rolf; Bernhart, Stephan H.; Flamm, Christoph; Fried, Claudia; Fritzsch, Guido; Hackermuller, Jorg; Hertel, Jana; Hofacker, Ivo L.; Missal, Kristin; Mosig, Axel; Prohaska, Sonja J.; Rose, Dominic; Stadler, Peter F.; Tanzer, Andrea; Washietl, Stefan; Will, Sebastian, RNAs everywhere: genome-wide annotation of structured rnas, Journal of experimental zoology part B: molecular and developmental evolution, 308, 1, 1-25, (2007) |

[8] | Chan, Timothy M., More algorithms for all-pairs shortest paths in weighted graphs, SIAM journal of computing, 39, 5, 2075-2089, (2010) · Zbl 1207.68436 |

[9] | Chitsaz, Hamidreza; Salari, Raheleh; Sahinalp, S. Cenk; Backofen, Rolf, A partition function algorithm for interacting nucleic acid strands, Bioinformatics, 25, 12, i365-i373, (2009) |

[10] | Cocke, John; Schwartz, Jacob T., Programming languages and their compilers, (1970), Courant Institute of Mathematical Sciences New York · Zbl 0245.68003 |

[11] | Do, Chuong B.; Woods, Daniel A.; Batzoglou, Serafim, Contrafold: RNA secondary structure prediction without physics-based models, Bioinformatics, 22, 14, e90-8, (2006) |

[12] | Dowell, Robin; Eddy, Sean, Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction, BMC bioinformatics, 5, 1, 71, (2004) |

[13] | Durbin, Richard; Eddy, Sean R.; Krogh, Anders; Mitchison, Graeme J., Biological sequence analysis: probabilistic models of proteins and nucleic acids, (1998), Cambridge University Press · Zbl 0929.92010 |

[14] | Frid, Yelena; Gusfield, Dan, A simple practical and complete \(O(\frac{n^3}{\log n})\)-time algorithm for RNA folding using the four-russians speedup, Algorithms for molecular biology, 5, 1, 5-13, (2010) |

[15] | Frid, Yelena; Gusfield, Dan, A worst-case and practical speedup for the RNA co-folding problem using the four-russians idea, (), 1-12 |

[16] | Gardner, Paul P.; Giegerich, Robert, A comprehensive comparison of comparative RNA structure prediction approaches, BMC bioinformatics, 5, 140, (2004) |

[17] | Graham, Susan L.; Harrison, Michael A.; Ruzzo, Walter L., An improved context-free recognizer, ACM transactions on programming languages and systems, 2, 3, 415-462, (1980) · Zbl 0461.68084 |

[18] | Havgaard, Jacob Hull; Lyngso, Rune B.; Stormo, Gary D.; Gorodkin, Jan, Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%, Bioinformatics, 21, 9, 1815-1824, (2005) |

[19] | Hirschberg, Daniel S., A linear space algorithm for computing maximal common subsequences, Communications of the ACM, 18, 6, 341-343, (1975) · Zbl 0301.68042 |

[20] | Hirschberg, Daniel S., Algorithms for the longest common subsequence problem, Jacm, 24, 664-675, (1977) · Zbl 0402.68041 |

[21] | Hofacker, Ivo L., Vienna RNA secondary structure server, Nucleic acids research, 13, 3429-3431, (2003) |

[22] | Jansson, Jesper; Ng, See-Kiong; Sung, Wing-Kin; Willy, Hugo, A faster and more space-efficient algorithm for inferring arc-annotations of RNA sequences through alignment, Algorithmica, 46, 2, 223-245, (2006) · Zbl 1101.68105 |

[23] | Tadao Kasami, An efficient recognition and syntax analysis algorithm for context-free languages, Technical Report AFCRL-65-758, Air Force Cambridge Res. Lab., Bedford, Mass., 1965. |

[24] | Mathews, David H.; Sabina, Jeffrey; Zuker, Michael; Turner, Douglas H., Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure, J. mol. biol., 288, 911-940, (1999) |

[25] | Mathews, David H.; Turner, Douglas H., Dynalign: an algorithm for finding the secondary structure common to two RNA sequences, Journal of molecular biology, 317, 2, 191-203, (2002) |

[26] | Nussinov, Ruth; Jacobson, Ann B., Fast algorithm for predicting the secondary structure of single-stranded RNA, Pnas, 77, 11, 6309-6313, (1980) |

[27] | Sakakibara, Yasubumi; Brown, Michael; Hughey, Richard; Mian, I. Saira; Sjolander, Kimmen; Underwood, Rebecca C.; Haussler, David, Stochastic context-free grammars for trna modeling, Nucleic acids research, 22, 23, 5112-5120, (1994) |

[28] | Sankoff, David, Simultaneous solution of the RNA folding, alignment and protosequence problems, SIAM journal on applied mathematics, 45, 5, 810-825, (1985) · Zbl 0581.92012 |

[29] | Supowit, Kenneth J., Finding a maximum planar subset of a set of nets in a channel, IEEE transactions on computer-aided design of integrated circuits and systems, 6, 1, 93-94, (1987) |

[30] | Tinoco, Ignacio; Uhlenbeck, Olke C.; Levine, Mark D., Estimation of secondary structure in ribonucleic acids, Nature, 230, 362-367, (1971) |

[31] | Tinoco, Ignacio; Borer, Philip N.; Dengler, Barbara; Levine, Mark D.; Uhlenbeck, Olke C.; Crothers, Donald M.; Gralla, Jay, Improved estimation of secondary structure in ribonucleic acids, Nature new biology, 246, 40-41, (1973) |

[32] | Torarinsson, Elfar; Havgaard, Jacob; Gorodkin, Jan, Multiple structural alignment and clustering of RNA sequences, Bioinformatics, 23, 8, 926-932, (2007) |

[33] | Waterman, Michael S.; Smith, Temple F., RNA secondary structure: a complete mathematical analysis, Mathematical biosciences, 42, 257-266, (1978) · Zbl 0402.92016 |

[34] | Wexler, Ydo; Zilberstein, Chaya; Ziv-Ukelson, Michal, A study of accessible motifs and RNA folding complexity, Journal of computational biology, 14, 6, 856-872, (2007) · Zbl 1302.92103 |

[35] | Will, Sebastian; Reiche, Kristin; Hofacker, Ivo L.; Stadler, Peter F.; Backofen, Rolf, Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering, PLOS computational biology, 3, 4, e65, (2007) |

[36] | Younger, Daniel H., Recognition and parsing of context-free languages in time \(n^3\), Information and control, 10, 2, 189-208, (1967) · Zbl 0149.24803 |

[37] | Zakov, Shay; Tsur, Dekel; Ziv-Ukelson, Michal, Reducing the worst case running times of a family of RNA and CFG problems, using Valiant’s approach, (), 65-77 |

[38] | Ziv-Ukelson, Michal; Gat-Viks, Irit; Wexler, Ydo; Shamir, Ron, A faster algorithm for simultaneous alignment and folding of RNA, Journal of computational biology, 17, 8, 1051-1065, (2010) |

[39] | Zuker, Michael, Computer prediction of RNA structure, Methods enzymol., 180, 262-288, (1989) |

[40] | Zuker, Michael, Mfold web server for nucleic acid folding and hybridization prediction, Nucleic acids research, 13, 3406-3415, (2003) |

[41] | Zuker, Michael; Stiegler, P., Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information, Nucleic acids research, 9, 1, 133-148, (1981) |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.