×

zbMATH — the first resource for mathematics

Algebraic dynamic programming for multiple context-free grammars. (English) Zbl 1344.68112
Summary: We present theoretical foundations, and a practical implementation, that makes the method of Algebraic Dynamic Programming available for Multiple Context-Free Grammars. This allows to formulate optimization problems, where the search space can be described by such grammars, in a concise manner and solutions may be obtained efficiently. This improves on the previous state of the art which required complex code based on hand-written dynamic programming recursions. We apply our method to the RNA pseudoknotted secondary structure prediction problem from computational biology. Appendix and supporting files available at: http://www.bioinf.uni-leipzig.de/Software/gADP/

MSC:
68Q42 Grammars and rewriting systems
90C39 Dynamic programming
92D20 Protein sequences, DNA sequences
Software:
FFTbor; ViennaRNA
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Akutsu, T., Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots, Discrete Appl. Math., 104, 45-62, (2000) · Zbl 0956.92019
[2] Alkan, C.; Karakoc, E.; Nadeau, J.; Sahinalp, S.; Zhang, K., RNA-RNA interaction prediction and antisense RNA target search, J. Comput. Biol., 13, 267-282, (2006) · Zbl 1119.92307
[3] Andonov, R.; Poirriez, V.; Rajopadhye, S., Unbounded knapsack problem: dynamic programming revisited, European J. Oper. Res., 123, 168-181, (2000) · Zbl 0961.90123
[4] Angelov, K., Incremental parsing with parallel multiple context-free grammars, (Gardent, C.; Nivre, J., Proceedings of the 12th Conference of the European Chapter of the ACL, (2009), Association for Computational Linguistics), 69-76
[5] Arlazarov, V. L.; Dinic, E. A.; Kronod, M. A.; Faradzev, I. A., On economical construction of the transitive closure of an oriented graph, Sov. Math., Dokl., 11, 1209-1210, (1970) · Zbl 0214.23601
[6] Bailor, M. H.; Sun, X.; Al-Hashimi, H. M., Topology links RNA secondary structure with global conformation, dynamics, and adaptation, Science, 327, 202-206, (2010)
[7] Baxter, R. J., Exactly solved models in statistical mechanics, (1982), Academic Press London, UK · Zbl 0538.60093
[8] Bellman, R., Dynamic programming treatment of the travelling salesman problem, J. ACM, 9, 61-63, (1962) · Zbl 0106.14102
[9] Bellman, R. E., Dynamic programming, (1957), Princeton University Press
[10] Bernardy, J.-P.; Claessen, K., Efficient divide-and-conquer parsing of practical context-free languages, ACM SIGPLAN Not., 48, 9, 111-122, (2013) · Zbl 1323.68329
[11] Bird, R., Pearls of functional algorithm design, (2010), Cambridge University Press · Zbl 1229.68026
[12] Blizard, W. D., Multiset theory, Notre Dame J. Form. Log., 30, 1, 36-66, (1988) · Zbl 0668.03027
[13] Bon, M.; Micheletti, C.; Orland, H., Mcgenus: a Monte Carlo algorithm to predict RNA secondary structures with pseudoknots, Nucleic Acids Res., 41, 1895-1900, (2013)
[14] Bon, M.; Orland, H., TT2NE: a novel algorithm to predict RNA secondary structures with pseudoknots, Nucleic Acids Res., 39, (2011)
[15] Bon, M.; Vernizzi, G.; Orland, H.; Zee, A., Topological classification of RNA structures, J. Mol. Biol., 379, 900-911, (2008)
[16] Chiang, D.; Joshi, A. K., Formal grammars for estimating partition functions of double-stranded chain molecules, (Marcus, M., Proceedings of the 2nd International Conference on Human Language Technology, (2003), Morgan Kaufmann San Francisco), 63-67
[17] Chitsaz, H.; Salari, R.; Sahinalp, S. C.; Backofen, R., A partition function algorithm for interacting nucleic acid strands, Bioinformatics, 25, i365-i373, (2009)
[18] Condon, A.; Davy, B.; Rastegari, B.; Zhao, S.; Tarrant, F., Classifying RNA pseudoknotted structures, Theoret. Comput. Sci., 320, 35-50, (2004) · Zbl 1043.92008
[19] Cormen, T. H.; Leiserson, C. E.; Rivest, R. L.; Stein, C., Introduction to algorithms, (2001), MIT Press · Zbl 1047.68161
[20] Cupal, J.; Flamm, C.; Renner, A.; Stadler, P. F., Density of states, metastable states, and saddle points. exploring the energy landscape of an RNA molecule, (Gaasterland, T.; Karp, P.; Karplus, K.; Ouzounis, C.; Sander, C.; Valencia, A., Proceedings of the ISMB-97, (1997), AAAI Press Menlo Park, CA), 88-91
[21] Dasgupta, S.; Papadimitriou, C.; Vasirani, U., Algorithms, (2006), McGraw-Hill
[22] Dijkstra, E. W., A note on two problems in connexion with graphs, Numer. Math., 1, 269-271, (1959) · Zbl 0092.16002
[23] Doudna, J. A.; Cech, T. R., The chemical repertoire of natural ribozymes, Nature, 418, 222-228, (2002)
[24] Farmer, A.; Höner zu Siederdissen, C.; Gill, A., The HERMIT in the stream: fusing stream Fusion’s concatmap, (Proceedings of the ACM SIGPLAN 2014 Workshop on Partial Evaluation and Program Manipulation, (2014), ACM), 97-108
[25] Feller, V., An introduction to probability theory and its applications: volume one, (1950), John Wiley & Sons
[26] Flajolet, P.; Sedgewick, R., Analytic combinatorics, (2009), Cambridge University Press Cambridge, UK · Zbl 1165.05001
[27] Giedroc, D. P.; Cornish, P. V., Frameshifting RNA pseudoknots: structure and mechanism, Virus Res., 139, 193-208, (2009)
[28] Giegerich, R.; Meyer, C., Algebraic dynamic programming, Algebraic Methodol. Softw. Technol., vol. 2422, 243-257, (2002), Springer
[29] Giegerich, R.; Meyer, C.; Steffen, P., A discipline of dynamic programming over sequence data, Sci. Comput. Prog., 51, 215-263, (2004) · Zbl 1067.90157
[30] Giegerich, R.; Schmal, K., Code selection techniques: pattern matching, tree parsing, and inversion of derivors, (Proceedings of the 2nd European Symposium on Programming, (1988), Springer Heidelberg), 247-268
[31] Giegerich, R.; Touzet, H., Modeling dynamic programming problems over sequences and trees with inverse coupled rewrite systems, Algorithms, 7, 62-144, (2014)
[32] Goodman, J., Semiring parsing, Comput. Linguist., 25, 4, 573-605, (1999)
[33] Gorodkin, J.; Heyer, L. J.; Stormo, G. D., Finding the most significant common sequence and structure motifs in a set of RNA sequences, Nucleic Acids Res., 25, 3724-3732, (1997)
[34] Graham, S. L.; Harrison, M. A.; Ruzzo, W. L., An improved context-free recognizer, ACM Trans. Program. Lang. Syst., 2, 415-462, (1980) · Zbl 0461.68084
[35] Haslinger, C.; Stadler, P. F., RNA structures with pseudo-knots: graph-theoretical and combinatorial properties, Bull. Math. Biol., 61, 437-467, (1999) · Zbl 1323.92149
[36] Höner zu Siederdissen, C., Sneaking around concat map: efficient combinators for dynamic programming, (Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming, ICFP’12, (2012), ACM New York, NY, USA), 215-226 · Zbl 1291.68122
[37] Höner zu Siederdissen, C.; Hofacker, I. L.; Stadler, P. F., How to multiply dynamic programming algorithms, (Brazilian Symposium on Bioinformatics, BSB 2013, Lect. Notes in Bioinform., vol. 8213, (2013), Springer-Verlag Heidelberg), 82-93
[38] Höner zu Siederdissen, C.; Hofacker, I. L.; Stadler, P. F., Product grammars for alignment and folding, IEEE/ACM Trans. Comput. Biol. Bioinform., 99, (2014)
[39] Höner zu Siederdissen, C.; Prohaska, S. J.; Stadler, P. F., Dynamic programming for set data types, (Campos, S., Advances in Bioinformatics and Computational Biology, BSB 2014, Lecture Notes in Comput. Sci., vol. 8826, (2014)), 57-64
[40] Höner zu Siederdissen, C.; Prohaska, S. J.; Stadler, P. F., Algebraic dynamic programming over general data structures, BMC Bioinform., 16, (2015)
[41] Hotz, G.; Pitsch, G., On parsing coupled-context-free languages, Theoret. Comput. Sci., 161, 205-233, (1996) · Zbl 0872.68089
[42] Huang, F. W.D.; Qin, J.; Reidys, C. M.; Stadler, P. F., Partition function and base pairing probabilities for RNA-RNA interaction prediction, Bioinformatics, 25, 2646-2654, (2009)
[43] Huang, F. W.D.; Qin, J.; Reidys, C. M.; Stadler, P. F., Target prediction and a statistical sampling algorithm for RNA-RNA interaction, Bioinformatics, 26, 175-181, (2010)
[44] Huang, F. W.D.; Reidys, C. M., On the combinatorics of sparsification, Alg. Mol. Biol., 7, 28, (2012)
[45] Jin, E. Y.; Qin, J.; Reidys, C. M., Combinatorics of RNA structures with pseudoknots, Bull. Math. Biol., 70, 45-67, (2008) · Zbl 1281.92055
[46] Jin, E. Y.; Reidys, C. M., Asymptotic enumeration of RNA structures with pseudoknots, Bull. Math. Biol., 70, 951-970, (2008) · Zbl 1152.92007
[47] Joshi, A. K., Tree adjoining grammars: how much context-sensitivity is required to provide reasonable structural descriptions?, (Dowty, D. R.; Karttunen, L.; Zwicky, A. M., Natural Language Parsing, (1985), Cambridge University Press), 206-250
[48] Joshi, A. K.; Levy, L. S.; Takahashi, M., Tree adjoining grammars, J. Comput. System Sci., 10, 136-163, (1975) · Zbl 0326.68053
[49] Kallmeyer, L., Parsing beyond context-free grammars, (2010), Springer · Zbl 1252.68005
[50] Kato, Y.; Seki, H.; Kasami, T., RNA pseudoknotted structure prediction using stochastic multiple context-free grammar, Inf. Media Technol., 2, 79-88, (2007)
[51] Knuth, D., Semantic of context-free languages, Math. Syst. Theory, Math. Syst. Theory, 5, 95-96, (1971), Correction · Zbl 0219.68035
[52] Kotowski, J.; Bry, F.; Eisinger, N., A potpourri of reason maintenance methods, (2011), Unpublished manuscript
[53] Lee, L., Fast context-free grammar parsing requires fast Boolean matrix multiplication, J. ACM, 49, 1, 1-15, (2002) · Zbl 1323.68332
[54] Lefebvre, F., An optimized parsing algorithm well suited to RNA folding, (Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 3, (1995)), 222-230
[55] Lefebvre, F., A grammar-based unification of several alignment and folding algorithms, (States, D. J.; Agarwal, P.; Gaasterland, T.; Hunter, L.; Smith, R. F., Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology, (1996), AAAI Press), 143-154
[56] Lorenz, R.; Bernhart, S. H.; Höner zu Siederdissen, C.; Tafer, H.; Flamm, C.; Stadler, P. F.; Hofacker, I. L., Viennarna package 2.0, Alg. Mol. Biol., 6, 26, (2011)
[57] Lorenz, W. A.; Ponty, Y.; Clote, P., Asymptotics of RNA shapes, J. Comput. Biol., 15, 31-63, (2008)
[58] Lyngsø, R. B.; Pedersen, C. N., RNA pseudoknot prediction in energy-based models, J. Comput. Biol., 7, 409-427, (2000)
[59] Mathews, D. H.; Turner, D. H., Dynalign: an algorithm for finding secondary structures common to two RNA sequences, J. Mol. Biol., 317, 191-203, (2002)
[60] Maurer, H. A., A direct proof of the inherent amiguity of a simple context-free language, J. ACM, 16, 256-260, (1969) · Zbl 0182.33401
[61] Möhl, M.; Salari, R.; Will, S.; Backofen, R.; Sahinalp, S. C., Sparsification of RNA structure prediction including pseudoknots, Alg. Mol. Biol., 5, 39, (2010)
[62] M. Möhl, C. Schmiedl, S. Zakov, Sparsification in algebraic dynamic programming, in: Unpublished Proceedings of the German Conference on Bioinformatics, GCB 2011, 2011, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.415.4136.
[63] Nakanishi, R.; Takada, K.; Seki, H., An efficient recognition algorithm for multiple context-free languages, (Becker, T.; Krieger, H.-U., Proceedings of the Fifth Meeting on Mathematics of Language, MOL5, DFKI Documents, vol. 97-02, (1997))
[64] Namy, O.; Moran, S. J.; Stuart, D. I.; Gilbert, R. J.C.; Brierley, I., A mechanical explanation of RNA pseudoknot function in programmed ribosomal frameshifting, Nature, 441, 244-247, (2006)
[65] Nebel, M. E.; Weinberg, F., Algebraic and combinatorial properties of common RNA pseudoknot classes with applications, J. Comput. Biol., 19, 1134-1150, (2012)
[66] Giegerich, Robert; Voß, Björn; Rehmsmeier, Marc, Abstract shapes of RNA, Nucl. Acids Res., 32, 16, 4843-4851, (2004)
[67] Orland, H.; Zee, A., RNA folding and large n matrix theory, Nuclear Phys. B, 620, 456-476, (2002) · Zbl 0991.92011
[68] Parikh, R., On context-free languages, J. ACM, 13, 570-581, (1966) · Zbl 0154.25801
[69] Pillsbury, M.; Orland, H.; Zee, A., Steepest descent calculation of RNA pseudoknots, Phys. Rev. E, 72, (2005)
[70] Ponty, Y.; Saule, C., A combinatorial framework for designing (pseudoknotted) RNA algorithms, (Przytycka, T. M.; Sagot, M.-F., WABI 2011, Lecture Notes in Comput. Sci., vol. 6833, (2011)), 250-269
[71] Rambow, O.; Satta, G., A two-dimensional hierarchy for parallel rewriting systems, (1994), University of Pennsylvania USA, Tech. Rep. IRCS-94-02 · Zbl 0938.68701
[72] Reghizzi, S. C.; Breveglieri, L.; Morzenti, A., Formal languages and compilation, (2009), Springer
[73] Reidys, C., Combinatorial computational biology of RNA: pseudoknots and neutral networks, (2011), Springer New York · Zbl 1207.92013
[74] Reidys, C. M.; Huang, F. W.D.; Andersen, J. E.; Penner, R. C.; Stadler, P. F.; Nebel, M. E., Topology and prediction of RNA pseudoknots, Bioinformatics, Bioinformatics, 28, 300-1085, (2012), Addendum in
[75] Rivas, E.; Eddy, S. R., A dynamic programming algorithm for RNA structure prediction including pseudoknots, J. Mol. Biol., 285, 2053-2068, (1999)
[76] Rivas, E.; Eddy, S. R., The language of RNA: a formal grammar that includes pseudoknots, Bioinformatics, 16, 334-340, (2000)
[77] Rivas, E.; Lang, R.; Eddy, S. R., A range of complex probabilistic models for RNA secondary structure prediction that include the nearest neighbor model and more, RNA, 18, 193-212, (2012)
[78] Rødland, E. A., Pseudoknots in RNA secondary structures: representation, enumeration, and prevalence, J. Comput. Biol., 13, 1197-1213, (2006)
[79] Sankoff, D., Simultaneous solution of the RNA folding, alignment, and proto-sequence problems, SIAM J. Appl. Math., 45, 810-825, (1985) · Zbl 0581.92012
[80] Seki, H.; Kato, Y., On the generative power of multiple context-free grammars and macro grammars, IEICE Trans. Inf. Syst. E, 91-D, 209-221, (2008)
[81] Seki, H.; Matsumura, T.; Fujii, M.; Kasami, T., On multiple context free grammars, Theoret. Comput. Sci., 88, 191-229, (1991) · Zbl 0762.68039
[82] Seki, H.; Nakanishi, R.; Kaji, Y.; Ando, S.; Kasami, T., Parallel multiple context-free grammars, finite-state translation systems, and polynomial-time recognizable subclasses of lexical-functional grammars, (Schubert, L., 31st Annual Meeting of the Association for Computational Linguistics, (1993), Association for Computational Linguistics), 130-140
[83] Senter, E.; Clote, P., Fast, approximate kinetics of RNA folding, J. Comput. Biol., 22, 124-144, (2015)
[84] Senter, E.; Sheikh, S.; Ponty, Y.; Clote, P., Using the fast Fourier transform to accelerate the computational search for RNA conformational switches, PLoS One, 7, (2012)
[85] Shamir, E., Some inherently ambiguous context-free languages, Inform. Control, 18, 355-363, (1971) · Zbl 0227.68039
[86] Shieber, S. M., Evidence against the context-freeness of natural language, Linguist. Philos., 8, 333-343, (1985)
[87] Skut, W.; Krenn, B.; Brants, T.; Uszkoreit, H., An annotation scheme for free word order languages, (Proceedings of the 5th Conference on Applied Natural Language Processing, (1997), Assoc. Comp. Linguistics Stroudsburg, PA), 88-95
[88] Stabler, E. P., Varieties of crossing dependencies: structure dependence and mild context sensitivity, Cogn. Sci., 28, 5, 699-720, (2004)
[89] Stanley, R. P., Differentiably finite power series, European J. Combin., 1, 175-188, (1980) · Zbl 0445.05012
[90] Steffen, P.; Giegerich, R., Versatile dynamic programming using pair algebras, BMC Bioinform., 6, 224, (2005)
[91] Taufer, M.; Licon, A.; Araiza, R.; Mireles, D.; van Batenburg, F. H.D.; Gultyaev, A.; Leung, M.-Y., Pseudobase++: an extension of pseudobase for easy searching, formatting and visualization of pseudoknots, Nucleic Acids Res., 37, D127-D135, (2009)
[92] Uemura, Y.; Hasegawa, A.; Kobayashi, S.; Yokomori, T., Tree adjoining grammars for RNA structure prediction, Theoret. Comput. Sci., 210, 277-303, (1999) · Zbl 0912.68121
[93] Valiant, L. G., General context-free recognition in less than cubic time, J. Comput. System Sci., 10, 2, 308-315, (1975) · Zbl 0312.68042
[94] van Vugt, N., Generalized context-free grammars, (1996), Univ. Leiden, Master’s thesis
[95] Vijay-Shanker, K.; Weir, D. J.; Joshi, A. K., Characterizing structural descriptions produced by various grammatical formalisms, (Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, ACL, (1987), Association for Computational Linguistics Stroudsburg, PA), 104-111
[96] Voß, B.; Giegerich, R.; Rehmsmeier, M., Complete probabilistic analysis of RNA shapes, BMC Biol., 4, 1, 5, (2006)
[97] Wadler, P., Comprehending monads, Math. Structures Comput. Sci., 2, 4, 461-493, (1992) · Zbl 0798.68040
[98] Waldispühl, J.; Behzadi, B.; Steyaert, J.-M., An approximate matching algorithm for finding (sub-)optimal sequences in S-attributed grammars, Bioinformatics, 18, 250-259, (2002)
[99] Weirich, S.; Hsu, J.; Eisenberg, R. A., System FC with explicit kind equality, ACM SIGPLAN Not., 48, 9, 275-286, (Sep. 2013)
[100] Zakov, S.; Tsur, D.; Ziv-Ukelson, M., Reducing the worst case running times of a family of RNA and CFG problems, using Valiant’s approach, Alg. Mol. Biol., 6, 20, (2011)
[101] Zuker, M., On finding all suboptimal foldings of an RNA molecule, Science, 244, 48-52, (1989) · Zbl 1226.92029
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.