×

Sparse RNA folding: time and space efficient algorithms. (English) Zbl 1247.68106

Kucherov, Gregory (ed.) et al., Combinatorial pattern matching. 20th annual symposium, CPM 2009, Lille, France, June 22–24, 2009. Proceedings. Berlin: Springer (ISBN 978-3-642-02440-5/pbk). Lecture Notes in Computer Science 5577, 249-262 (2009).
Summary: The classical algorithm for RNA single strand folding requires \(O(n Z)\) time and \(O(n ^{2})\) space, where \(n\) denotes the length of the input sequence and \(Z\) is a sparsity parameter that satisfies \(n \leq Z \leq n ^{2}\). We show how to reduce the space complexity of this algorithm. The space reduction is based on the observation that some solutions for subproblems are not examined after a certain stage of the algorithm, and may be discarded from memory. This yields an \(O(nZ)\) time and \(O(Z)\) space algorithm, that outputs both the cardinality of the optimal folding as well as a corresponding secondary structure. The space-efficient approach also extends to the related RNA simultaneous alignment with folding problem, and can be applied to reduce the space complexity of the fastest algorithm for this problem from \(O(n ^{2} m ^{2})\) down to \(O(nm^2 + \tilde{Z})\), where \(n\) and \(m\) denote the lengths of the input sequences to be aligned, and \(\tilde{Z}\) is a sparsity parameter that satisfies \(n m \leq \tilde{Z}\leq n ^{2} m ^{2}\).
In addition, we also show how to speed up the base-pairing maximization variant of RNA single strand folding. The speed up is achieved by combining two independent existing techniques, which restrict the number of expressions that need to be examined in bottleneck computations of these algorithms. This yields an \(O(LZ)\) time and \(O(Z)\) space algorithm, where \(L\) denotes the maximum cardinality of a folding of the input sequence.
Additional online supporting material may be found at:
http://www.cs.bgu.ac.il/zakovs/RNAfold/CPM09\_supporting\_material.pdf
For the entire collection see [Zbl 1165.68013].

MSC:

68Q25 Analysis of algorithms and problem complexity
92D20 Protein sequences, DNA sequences

Software:

Mfold
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Consortium, A.F.B., Backofen, R., Bernhart, S.H., Flamm, C., Fried, C., Fritzsch, G., Hackermuller, J., Hertel, J., Hofacker, I.L., Missal, K., Mosig, A., Prohaska, S.J., Rose, D., Stadler, P.F., Tanzer, A., Washietl, S., Will, S.: RNAs everywhere: genome-wide annotation of structured RNAs. Journal of Experimental Zoology Part B: Molecular and Developmental Evolution 308(1), 1–25 (2007)
[2] Zuker, M.: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research (13), 3406–3415 (2003) · Zbl 05437421 · doi:10.1093/nar/gkg595
[3] Hofacker, I.L.: Vienna RNA secondary structure server. Nucleic Acids Research (13), 3429–3431 (2003) · Zbl 05435843 · doi:10.1093/nar/gkg599
[4] Zuker, M.: Computer prediction of RNA structure. Methods Enzymol. 180, 262–288 (1989) · Zbl 0681.92011 · doi:10.1016/0076-6879(89)80106-5
[5] Tinoco, I., Borer, P., Dengler, B., Levine, M., Uhlenbeck, O., Crothers, D., Gralla, J.: Improved estimation of secondary structure in ribonucleic acids. Nature New Biology 246, 40–41 (1973) · doi:10.1038/newbio246040a0
[6] Waterman, M., Smith, T.: RNA secondary structure: a complete mathematical analysis. Mathematical Biosciences 42, 257–266 (1978) · Zbl 0402.92016 · doi:10.1016/0025-5564(78)90099-8
[7] Nussinov, R., Jacobson, A.B.: Fast algorithm for predicting the secondary structure of single-stranded RNA. PNAS 77(11), 6309–6313 (1980) · doi:10.1073/pnas.77.11.6309
[8] Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Research 9(1), 133–148 (1981) · Zbl 05437422 · doi:10.1093/nar/9.1.133
[9] Akutsu, T.: Approximation and exact algorithms for RNA secondary structure prediction and recognition of stochastic context-free languages. Journal of Combinatorial Optimization 3, 321–336 (1999) · Zbl 0972.92012 · doi:10.1023/A:1009898029639
[10] Wexler, Y., Zilberstein, C., Ziv-Ukelson, M.: A study of accessible motifs and RNA folding complexity. Journal of Computational Biology 14(6), 856–872 (2007) · Zbl 1302.92103 · doi:10.1089/cmb.2007.R020
[11] Chan, T.M.: More algorithms for all-pairs shortest paths in weighted graphs. In: Proc. 39th Symposium on the Theory of Computing (STOC), pp. 590–598 (2007) · Zbl 1231.05245 · doi:10.1145/1250790.1250877
[12] Sankoff, D.: Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM Journal on Applied Mathematics 45(5), 810–825 (1985) · Zbl 0581.92012 · doi:10.1137/0145048
[13] Mathews, D.H., Turner, D.H.: Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. Journal of Molecular Biology 317(2), 191–203 (2002) · doi:10.1006/jmbi.2001.5351
[14] Havgaard, J., Lyngso, R., Stormo, G., Gorodkin, J.: Pairwise local structural alignment of RNA sequences with sequence similarity less than 40 · doi:10.1093/bioinformatics/bti279
[15] Ziv-Ukelson, M., Gat-Viks, I., Wexler, Y., Shamir, R.: A faster algorithm for RNA co-folding, pp. 174–185 (2008)
[16] Will, S., Reiche, K., Hofacker, I.L., Stadler, P.F., Backofen, R.: Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering. PLOS Computational Biology 3(4), e65 (2007) · doi:10.1371/journal.pcbi.0030065
[17] Gardner, P.P., Giegerich, R.: A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics 5, 140 (2004) · Zbl 05325965 · doi:10.1186/1471-2105-5-140
[18] Jansson, J., Ng, S.K., Sung, W.K., Willy, H.: A faster and more space-efficient algorithm for inferring arc-annotations of RNA sequences through alignment. Algorithmica 46(2), 223–245 (2006) · Zbl 1101.68105 · doi:10.1007/s00453-006-1207-0
[19] Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge (1998) · Zbl 0929.92010 · doi:10.1017/CBO9780511790492
[20] Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Communications of the ACM 18(6), 341–343 (1975) · Zbl 0301.68042 · doi:10.1145/360825.360861
[21] Hirschberg, D.S.: Algorithms for the longest common subsequence problem. JACM 24, 664–675 (1977) · Zbl 0402.68041 · doi:10.1145/322033.322044
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.