Fast RNA structure alignment for crossing input structures.

*(English)*Zbl 1216.68359Summary: The complexity of pairwise RNA structure alignment depends on the structural restrictions assumed for both the input structures and the computed consensus structure. For arbitrarily crossing input and consensus structures, the problem is NP-hard. For non-crossing consensus structures, the algorithm of T. Jiang, G. Lin, B. Ma and K. Zhang [“A general edit distance between RNA structures”, J. Comput. Biol. 9, No. 2, 371–388 (2002)] computes the alignment in \(O(n^{2}m^{2})\) time where \(n\) and \(m\) denote the lengths of the two input sequences. If the input structures are also non-crossing, the problem corresponds to tree editing which can be solved in \(O(m^2n(1+\log \frac nm))\) time [E. D. Demaine et al., Lect. Notes Comput. Sci. 4596, 146–157 (2007; Zbl 1171.68843)]. We present a new algorithm that solves the problem for \(d\)-crossing structures in \(O(dm^2n \log n)\) time, where \(d\) is a parameter that is one for non-crossing structures, bounded by \(n\) for crossing structures, and much smaller than \(n\) on many practical examples. Crossing input structures allow for applications where the input is not a fixed structure but is given as base-pair probability matrices.

##### MSC:

68W40 | Analysis of algorithms |

68W32 | Algorithms on strings |

92D20 | Protein sequences, DNA sequences |

##### Software:

CONTRAfold
PDF
BibTeX
XML
Cite

\textit{R. Backofen} et al., J. Discrete Algorithms 9, No. 1, 2--11 (2011; Zbl 1216.68359)

Full Text:
DOI

##### References:

[1] | Akutsu, T., Approximation and exact algorithms for RNA secondary structure prediction and recognition of stochastic context-free languages, J. of combinatorial optimization, 3, 321-336, (1999) · Zbl 0972.92012 |

[2] | Consortium, A.F.B.; Backofen, R.; Bernhart, S.H.; Flamm, C.; Fried, C.; Fritzsch, G.; Hackermuller, J.; Hertel, J.; Hofacker, I.L.; Missal, K.; Mosig, A.; Prohaska, S.J.; Rose, D.; Stadler, P.F.; Tanzer, A.; Washietl, S.; Will, S., RNAs everywhere: genome-wide annotation of structured rnas, J. of experimental zoology part B: molecular and developmental evolution, 308, 1, 1-25, (2007) |

[3] | E.D. Demaine, S. Mozes, B. Rossman, O. Weimann, An optimal decomposition algorithm for tree edit distance, in: Proc. 34th International Colloquium on Automata, Languages and Programming (ICALP), 2007, pp. 146-157. · Zbl 1171.68843 |

[4] | Do, C.B.; Woods, D.A.; Batzoglou, S., Contrafold: RNA secondary structure prediction without physics-based models, Bioinformatics, 22, 14, e90-e98, (2006) |

[5] | P.A. Evans, Finding common RNA pseudoknot structures in polynomial time, in: Proc. 17th Symposium on Combinatorial Pattern Matching (CPM), 2006, pp. 223-232. · Zbl 1196.68102 |

[6] | P.A. Evans, Algorithms and complexity for annotated sequence analysis, PhD thesis, University of Alberta, 1999. |

[7] | Gardner, P.P.; Giegerich, R., A comprehensive comparison of comparative RNA structure prediction approaches, BMC bioinformatics, 5, 140, (2004) |

[8] | Havgaard, J.H.; Lyngso, R.B.; Stormo, G.D.; Gorodkin, J., Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%, Bioinformatics, 21, 9, 1815-1824, (2005) |

[9] | Jiang, T.; Lin, G.; Ma, B.; Zhang, K., A general edit distance between RNA structures, J. of computational biology, 9, 2, 371-388, (2002) |

[10] | P. Klein, Computing the edit-distance between unrooted ordered trees, in: Proc. 6th European Symposium on Algorithms (ESA), 1998, pp. 91-102. · Zbl 0932.68066 |

[11] | Mathews, D.H.; Turner, D.H., Dynalign: an algorithm for finding the secondary structure common to two RNA sequences, J. of molecular biology, 317, 2, 191-203, (2002) |

[12] | Nussinov, R.; Jacobson, A., Fast algorithm for predicting the secondary structure of single-stranded RNA, Proc. national Academy of science, 77, 11, 6309-6313, (1980) |

[13] | Sankoff, D., Simultaneous solution of the RNA folding, alignment and protosequence problems, SIAM J. on applied mathematics, 45, 5, 810-825, (1985) · Zbl 0581.92012 |

[14] | Washietl, S.; Hofacker, I.L.; Lukasser, M.; Huttenhofer, A.; Stadler, P.F., Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome, Nature biotechnology, 23, 11, 1383-1390, (2005) |

[15] | Waterman, M.S.; Smith, T.F., RNA secondary structure: A complete mathematical analysis, Mathematical biosciences, 42, 257-266, (1978) · Zbl 0402.92016 |

[16] | Wexler, Y.; Zilberstein, C.; Ziv-Ukelson, M., A study of accessible motifs and RNA folding complexity, J. of computational biology, 14, 6, 856-872, (2007) |

[17] | Will, S.; Reiche, K.; Hofacker, I.L.; Stadler, P.F.; Backofen, R., Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering, PLOS computational biology, 3, 4, e65, (2007) |

[18] | M. Ziv-Ukelson, I. Gat-Viks, Y. Wexler, R. Shamir, A faster algorithm for RNA co-folding, in: Proc. 8th Workshop on Algorithms in Bioinformatics (WABI), 2008, pp. 174-185. |

[19] | Zuker, M.; Stiegler, P., Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information, Nucleic acids research, 9, 1, 133-148, (1981) |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.