Counting, generating and sampling tree alignments. (English) Zbl 1346.92048

Botón-Fernández, María (ed.) et al., Algorithms for computational biology. Third international conference, AlCoB 2016, Trujillo, Spain, June 21–22, 2016. Proceedings. Cham: Springer (ISBN 978-3-319-38826-7/pbk; 978-3-319-38827-4/ebook). Lecture Notes in Computer Science 9702. Lecture Notes in Bioinformatics, 53-64 (2016).
Summary: Pairwise ordered tree alignment are combinatorial objects that appear in RNA secondary structure comparison. However, the usual representation of tree alignments as supertrees is ambiguous, i.e. two distinct supertrees may induce identical sets of matches between identical pairs of trees. This ambiguity is uninformative, and detrimental to any probabilistic analysis. In this work, we consider tree alignments up to equivalence. Our first result is a precise asymptotic enumeration of tree alignments, obtained from a context-free grammar by means of basic analytic combinatorics. Our second result focuses on alignments between two given ordered trees. By refining our grammar to align specific trees, we obtain a decomposition scheme for the space of alignments, and use it to design an efficient dynamic programming algorithm for sampling alignments under the Gibbs-Boltzmann probability distribution. This generalizes existing tree alignment algorithms, and opens the door for a probabilistic analysis of the space of suboptimal RNA secondary structures alignments.
For the entire collection see [Zbl 1337.92004].


92D20 Protein sequences, DNA sequences
05C90 Applications of graph theory
Full Text: DOI arXiv


[1] Andrade, H., Area, I., Nieto, J.J., Torres, A.: The number of reduced alignments between two dna sequences. BMC Bioinformatics 15, 94 (2014). http://dx.doi.org/10.1186/1471-2105-15-94
[2] Blin, G., Denise, A., Dulucq, S., Herrbach, C., Touzet, H.: Alignments of RNA structures. IEEE/ACM Trans. Comput. Biol. Bioinform. 7(2), 309–322 (2010). http://doi.acm.org/10.1145/1791396.1791409
[3] Chauve, C., Courtiel, J., Ponty, Y.: Counting, generating and sampling tree alignments. In: ALCOB - 3rd International Conference on Algorithms for Computational Biology - 2016. Trujillo, Spain, Jun 2016. https://hal.inria.fr/hal-01154030 · Zbl 1346.92048
[4] Do, C.B., Gross, S.S., Batzoglou, S.: CONTRAlign: discriminative training for protein sequence alignment. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2006. LNCS (LNBI), vol. 3909, pp. 160–174. Springer, Heidelberg (2006) · Zbl 1302.92098
[5] Dress, A., Morgenstern, B., Stoye, J.: The number of standard and of effective multiple alignments. Appl. Math. Lett. 11(4), 43–49 (1998). http://www.sciencedirect.com/science/article/pii/S0893965998000548 · Zbl 0936.92022
[6] Flajolet, P., Sedgewick, R.: Analytic combinatorics. Cambridge University Press, Cambridge (2009) · Zbl 1165.05001
[7] Herrbach, C., Denise, A., Dulucq, S.: Average complexity of the Jiang-Wang-Zhang pairwise tree alignment algorithm and of a RNA secondary structure alignment algorithm. Theor. Comput. Sci. 411(26–28), 2423–2432 (2010). http://dx.doi.org/10.1016/j.tcs.2010.01.014 · Zbl 1208.68241
[8] Höchsmann, M., Töller, T., Giegerich, R., Kurtz, S.: Local similarity in RNA secondary structures. Proc. Ieee Comput. Soc. Bioinform Conf. 2, 159–168 (2003)
[9] Höchsmann, M., Voss, B., Giegerich, R.: Pure multiple rna secondary structure alignments: a progressive profile approach. IEEE/ACM Trans. Comput. Biol. Bioinformatics 1(1), 53–62 (2004). http://dx.doi.org/10.1109/TCBB.2004.11 · Zbl 05103331
[10] Jiang, T., Wang, L., Zhang, K.: Alignment of trees - an alternative to tree edit. Theor. Comput. Sci. 143(1), 137–148 (1995). http://dx.doi.org/10.1016/0304-3975(95)80029-9
[11] Ponty, Y., Saule, C.: A combinatorial framework for designing (pseudoknotted) RNA algorithms. In: Przytycka, T.M., Sagot, M.-F. (eds.) WABI 2011. LNCS, vol. 6833, pp. 250–269. Springer, Heidelberg (2011). http://dx.doi.org/10.1007/978-3-642-23038-7_22 · Zbl 05946461
[12] Schirmer, S., Giegerich, R.: Forest alignment with affine gaps and anchors, applied in RNA structure comparison. Theor. Comput. Sci. 483, 51–67 (2013). http://dx.doi.org/10.1016/j.tcs.2012.07.040 · Zbl 1292.68184
[13] Torres, A., Cabada, A., Nieto, J.J.: An exact formula for the number of alignments between two DNA sequences. DNA Seq. 14(6), 427–430 (2003)
[14] Vingron, M., Argos, P.: Determination of reliable regions in protein sequence alignments. Protein Eng. 3(7), 565–569 (1990). http://peds.oxfordjournals.org/content/3/7/565.abstract
[15] Waterman, M.S.: Introduction to Computational Biology: Maps, Sequences, and Genomes. CRC Press, Pevzner (1995) · Zbl 0831.92011
[16] Wilf, H.S.: A unified setting for sequencing, ranking, and selection algorithms for combinatorial objects. Adv. Math. 24, 281–291 (1977) · Zbl 0354.05041
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.