# zbMATH — the first resource for mathematics

Fast RNA structure alignment for crossing input structures. (English) Zbl 1247.68105
Kucherov, Gregory (ed.) et al., Combinatorial pattern matching. 20th annual symposium, CPM 2009, Lille, France, June 22–24, 2009. Proceedings. Berlin: Springer (ISBN 978-3-642-02440-5/pbk). Lecture Notes in Computer Science 5577, 236-248 (2009).
Summary: The complexity of pairwise RNA structure alignment depends on the structural restrictions assumed for both the input structures and the computed consensus structure. For arbitrarily crossing input and consensus structures, the problem is NP-hard. For non-crossing consensus structures, T. Jiang et al.’s algorithm [“A general edit distance between RNA structures”, J. Comput. Biology 9, No. 2, 371–388 (2002)] computes the alignment in $$O(n ^{2} m ^{2})$$ time where $$n$$ and $$m$$ denote the lengths of the two input sequences. If also the input structures are non-crossing, the problem corresponds to tree editing which can be solved in $$O(m^2n(1+\log\frac{n}{m}))$$ time. We present a new algorithm that solves the problem for $$d$$-crossing structures in $$O(d m ^{2} n \log n)$$ time, where $$d$$ is a parameter that is one for non-crossing structures, bounded by $$n$$ for crossing structures, and much smaller than $$n$$ on most practical examples. Crossing input structures allow for applications where the input is not a fixed structure but is given as base-pair probability matrices.
For the entire collection see [Zbl 1165.68013].

##### MSC:
 68Q25 Analysis of algorithms and problem complexity 68W32 Algorithms on strings 92D20 Protein sequences, DNA sequences
CONTRAfold
Full Text:
##### References:
 [1] Jiang, T., Lin, G., Ma, B., Zhang, K.: A general edit distance between RNA structures. J. Comput. Biol. 9(2), 371–388 (2002) · doi:10.1089/10665270252935511 [2] Demaine, E.D., Mozes, S., Rossman, B., Weimann, O.: An optimal decomposition algorithm for tree edit distance. In: Arge, L., Cachin, C., Jurdziński, T., Tarlecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 146–157. Springer, Heidelberg (2007) · Zbl 1171.68843 · doi:10.1007/978-3-540-73420-8_15 [3] Washietl, S., Hofacker, I.L., Lukasser, M., Huttenhofer, A., Stadler, P.F.: Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat. Biotechnol. 23(11), 1383–1390 (2005) · doi:10.1038/nbt1144 [4] Consortium, A.F.B., Backofen, R., Bernhart, S.H., Flamm, C., Fried, C., Fritzsch, G., Hackermuller, J., Hertel, J., Hofacker, I.L., Missal, K., Mosig, A., Prohaska, S.J., Rose, D., Stadler, P.F., Tanzer, A., Washietl, S., Will, S.: RNAs everywhere: genome-wide annotation of structured RNAs. J. Exp. Zoolog B Mol. Dev. Evol. 308(1), 1–25 (2007) [5] Waterman, M., Smith, T.: RNA secondary structure: a complete mathematical analysis. Math. Biosci. 42, 257–266 (1978) · Zbl 0402.92016 · doi:10.1016/0025-5564(78)90099-8 [6] Nussinov, R., Jacobson, A.: Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc. Natl. Acad. Sci. 77(11), 6309–6313 (1980) · doi:10.1073/pnas.77.11.6309 [7] Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Research 9(1), 133–148 (1981) · Zbl 05437422 · doi:10.1093/nar/9.1.133 [8] Akutsu, T.: Approximation and exact algorithms for RNA secondary structure prediction and recognition of stochastic context-free languages. Journal of Combinatorial Optimization 3, 321–336 (1999) · Zbl 0972.92012 · doi:10.1023/A:1009898029639 [9] Wexler, Y., Zilberstein, C., Ziv-Ukelson, M.: A study of accessible motifs and RNA folding complexity. Journal of Computational Biology 14(6), 856–872 (2007) · Zbl 1302.92103 · doi:10.1089/cmb.2007.R020 [10] Do, C.B., Woods, D.A., Batzoglou, S.: CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 22(14), e90–e98 (2006) · doi:10.1093/bioinformatics/btl246 [11] Gardner, P.P., Giegerich, R.: A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics 5, 140 (2004) · Zbl 05325965 · doi:10.1186/1471-2105-5-140 [12] Hofacker, I.L., Fekete, M., Stadler, P.F.: Secondary structure prediction for aligned RNA sequences. Journal of Molecular Biology 319(5), 1059–1066 (2002) · doi:10.1016/S0022-2836(02)00308-X [13] Seemann, S.E., Gorodkin, J., Backofen, R.: Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments. Nucleic Acids Research (2008) · doi:10.1093/nar/gkn544 [14] Klein, P.: Computing the edit-distance between unrooted ordered trees. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 91–102. Springer, Heidelberg (1998) · Zbl 0932.68066 · doi:10.1007/3-540-68530-8_8 [15] Sankoff, D.: Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math. 45(5), 810–825 (1985) · Zbl 0581.92012 · doi:10.1137/0145048 [16] Mathews, D.H., Turner, D.H.: Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. Journal of Molecular Biology 317(2), 191–203 (2002) · doi:10.1006/jmbi.2001.5351 [17] Havgaard, J.H., Lyngso, R.B., Stormo, G.D., Gorodkin, J.: Pairwise local structural alignment of RNA sequences with sequence similarity less than 40. Bioinformatics 21(9), 1815–1824 (2005) · doi:10.1093/bioinformatics/bti279 [18] Ziv-Ukelson, M.I., Gat-Viks, Y.W., Shamir, R.: A faster algorithm for RNA co-folding. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 174–185. Springer, Heidelberg (2008) · Zbl 05351272 · doi:10.1007/978-3-540-87361-7_15 [19] Will, S., Reiche, K., Hofacker, I.L., Stadler, P.F., Backofen, R.: Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering. PLOS Computational Biology 3(4), e65 (2007) · doi:10.1371/journal.pcbi.0030065 [20] Evans, P.A.: Finding common rna pseudoknot structures in polynomial time. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 223–232. Springer, Heidelberg (2006) · Zbl 1196.68102 · doi:10.1007/11780441_21 [21] Evans, P.A.: Algorithms and Complexity for Annotated Sequence Analysis. PhD thesis, University of Alberta (1999)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.