Modeling dynamic programming problems over sequences and trees with inverse coupled rewrite systems. (English) Zbl 1461.68251

Summary: Dynamic programming is a classical algorithmic paradigm, which often allows the evaluation of a search space of exponential size in polynomial time. Recursive problem decomposition, tabulation of intermediate results for re-use, and Bellman’s Principle of Optimality are its well-understood ingredients. However, algorithms often lack abstraction and are difficult to implement, tedious to debug, and delicate to modify. The present article proposes a generic framework for specifying dynamic programming problems. This framework can handle all kinds of sequential inputs, as well as tree-structured data. Biosequence analysis, document processing, molecular structure analysis, comparison of objects assembled in a hierarchic fashion, and generally, all domains come under consideration where strings and ordered, rooted trees serve as natural data representations. The new approach introduces inverse coupled rewrite systems. They describe the solutions of combinatorial optimization problems as the inverse image of a term rewrite relation that reduces problem solutions to problem inputs. This specification leads to concise yet translucent specifications of dynamic programming algorithms. Their actual implementation may be challenging, but eventually, as we hope, it can be produced automatically. The present article demonstrates the scope of this new approach by describing a diverse set of dynamic programming problems which arise in the domain of computational biology, with examples in biosequence and molecular structure analysis.


68W05 Nonnumerical algorithms
68Q42 Grammars and rewriting systems
90C27 Combinatorial optimization
90C39 Dynamic programming
92D20 Protein sequences, DNA sequences
Full Text: DOI


[1] Bellman, R.E.; ; Dynamic Programming: Princeton, NJ, UK 1957; . · Zbl 0077.13605
[2] Gusfield, D.; ; Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology: Cambridge, UK 1997; . · Zbl 0934.68103
[3] Needleman, S.B.; Wunsch, C.D.; A general method applicable to the search for similarities in the amino acid sequence of two proteins; J. Mol. Biol.: 1970; Volume 48 ,443-453.
[4] Smith, T.F.; Waterman, M.S.; Identification of common molecular subsequences; J. Mol. Biol.: 1981; Volume 147 ,195-197.
[5] Sankoff, D.; Simultaneous solution of the RNA folding, alignment and protosequence problems; SIAM J. Appl. Math.: 1985; Volume 45 ,810-825. · Zbl 0581.92012
[6] Ferch, M.; Zhang, J.; Höchsmann, M.; Learning cooperative assembly with the graph representation of a state-action space; Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems: ; .
[7] Reis, D.; Golgher, P.; Silva, A.; Laender, A.; Automatic web news extraction using tree edit distance; Proceedings of the 13th International Conference on World Wide Web: ; ,502-511.
[8] Searls, D.B.; Investigating the linguistics of DNA and definite clause grammars; Proceedings of the North American Conference on Logic Programming: Cambridge, MA, USA 1989; ,189-208.
[9] Searls, D.B.; The Computational Linguistics of Biological Sequences; Artificial Intelligence and Molecular Biology: Cambridge, MA, USA 1993; ,47-120.
[10] Searls, D.B.; Linguistic approaches to biological sequences; CABIOS: 1997; Volume 13 ,333-344.
[11] Lefebvre, ; A grammar-based unification of several alignment and folding algorithms; AAAI Intell. Syst. Mol. Biol.: 1996; Volume 4 ,143-154.
[12] Knuth, D.E.; Semantics of context free languages; Theory Comput. Syst.: 1968; Volume 2 ,127-145. · Zbl 0169.01401
[13] Giegerich, R.; Meyer, C.; Steffen, P.; A discipline of dynamic programming over sequence data; Sci. Comput. Program.: 2004; Volume 51 ,215-263. · Zbl 1067.90157
[14] Giegerich, R.; Steffen, P.; Challenges in the compilation of a domain specific language for dynamic programming; Proceedings of the 2006 ACM Symposium on Applied Computing: ; .
[15] Höner zu Siederdissen, C.; Sneaking around concatMap: efficient combinators for dynamic programming; Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming: New York, NY, USA 2012; ,215-226. · Zbl 1291.68122
[16] Sauthoff, G.; Möhl, M.; Janssen, S.; Giegerich, R.; Bellman’s GAP-A language and compiler for dynamic programming in sequence analysis; Bioinformatics: 2013; Volume 29 ,551-560.
[17] Baader, F.; Nipkow, T.; ; Term Rewriting and All That: Cambridge, UK 1998; . · Zbl 0948.68098
[18] Ohlebusch, E.; ; Advanced Topics in Term Rewriting: New York, NY, USA 2002; . · Zbl 0999.68095
[19] Gotoh, O.; An improved algorithm for matching biological sequences; J. Mol. Biol.: 1982; Volume 162 ,705-708.
[20] Durbin, R.; Eddy, S.R.; Krogh, A.; Mitchison, G.; ; Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids: Cambridge, UK 1998; . · Zbl 0929.92010
[21] Giegerich, R.; Explaining and controlling ambiguity in dynamic programming; Proceedings of the Combinatorial Pattern Matching: New York, NY, USA 2000; ,46-59. · Zbl 0964.90050
[22] Nussinov, R.; Pieczenik, G.; Griggs, J.; Kleitman, D.; Algorithms for loop matchings; SIAM J. Appl. Math.: 1978; Volume 35 ,68-82. · Zbl 0411.92008
[23] Baker, J.K.; Trainable grammars for speech recognition; J. Acoust. Soc. Am.: 1979; Volume 65 ,54-550.
[24] Sakakibara, Y.; Brown, M.; Hughey, R.; Mian, I.; Sjölander, K.; Underwood, R.; Haussler, D.; Recent Methods for RNA Modeling Using Stochastic Context-Free Grammars; Combinatorial Pattern Matching 1994: ; ,289-306.
[25] Dowell, R.; Eddy, S.; Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction; BMC Bioinf.: 2004; Volume 5 ,71.
[26] Jiang, T.; Lin, G.; Ma, B.; Zhang, K.; A general edit distance between RNA structures; J. Comput. Biol.: 2002; Volume 9 ,371-388.
[27] Jiang, T.; Wang, L.; Zhang, K.; Alignment of trees - An alternative to tree edit; Theor. Comput. Sci.: 1995; Volume 143 ,137-148. · Zbl 0873.68150
[28] Macke, T.J.; Ecker, D.J.; Gutell, R.R.; Gautheret, D.; Case, D.A.; Sampath, R.; RNAMotif, an RNA secondary structure definition and search algorithm; Nucleic Acids Res.: 2001; Volume 29 ,4724-4735.
[29] Reeder, J.; Reeder, J.; Giegerich, R.; Locomotif: from graphical motif description to RNA motif search; Bioinformatics: 2007; Volume 23 ,i392.
[30] Eddy, S.R.; Durbin, R.; RNA sequence analysis using covariance models; Nucleic Acids Res.: 1994; Volume 22 ,2079-2088.
[31] Giegerich, R.; Höner zu Siederdissen, C.; Semantics and ambiguity of stochastic RNA family models; IEEE/ACM Trans. Comput. Biol. Bioinf.: 2011; Volume 8 ,499-516.
[32] Gardner, P.; Daub, J.; Tate, J.; Nawrocki, E.; Kolbe, D.; Lindgreen, S.; Wilkinson, A.; Finn, R.; Griffiths-Jones, S.; Eddy, S.; Rfam: Updates to the RNA families database; Nucleic Acids Res.: 2009; Volume 37 ,D136.
[33] Rosselló, F.; Valiente, G.; An algebraic view of the relation between largest common subtrees and smallest common supertrees; Theor. Comput. Sci.: 2006; Volume 362 ,33-53. · Zbl 1111.68091
[34] Chawathe, S.; Comparing hierarchical data in external memory; Proceedings of the 25th International Conference on Very Large Data Bases: 1999; ,90-101.
[35] Touzet, H.; Tree edit distance with gaps; Inf. Process. Lett.: 2003; Volume 85 ,123-129. · Zbl 1011.68851
[36] Zhuozhi Wang, K.Z.; Alignment between Two RNA Structures; Mathematical Foundations of Computer Science: 2001; ,690-702. · Zbl 1070.92515
[37] Backofen, R.; Chen, S.; Hermelin, D.; Landau, G.M.; Roytberg, M.A.; Weimann, O.; Zhang, K.; Locality and gaps in RNA comparison; J. Comput. Biol.: 2007; Volume 14 ,1074-1087.
[38] Jansson, J.; Hieu, N.T.; Sung, W.K.; Local gapped subforest alignment and its application in finding RNA structural motifs; J. Comput. Biol.: 2006; Volume 13 ,702-718. · Zbl 1116.68670
[39] Schirmer, S.; Giegerich, R.; Forest Alignment with Affine Gaps and Anchors; Combinatorial Pattern Matching: New York, NY, USA 2011; ,104-117. · Zbl 1339.68212
[40] Dulucq, S.; Touzet, H.; Decomposition algorithms for the tree edit distance problem; J. Discrete Algorithms: 2005; Volume 3 ,448-471. · Zbl 1129.68099
[41] Erik Demaine, Shay Mozes, B.R.; Weimann, O.; An optimal decomposition algorithm for tree edit distance; ACM Trans. Algorithms: 2009; Volume 6 . · Zbl 1300.68057
[42] Sauthoff, G.; Janssen, S.; Giegerich, R.; Bellman’s GAP: A declarative language for dynamic programming; Proceedings of the 13th International ACM SIGPLAN Symposium on Principles and Practices of Declarative Programming: New York, NY, USA 2011; ,29-40.
[43] Searls, D.B.; Murphy, K.P.; Automata-theoretic models of mutation and alignment; Proceedings of the ISMB, 1995: ; ,341-349.
[44] Huet, G.; Lankford, D.; On theUniform Halting Problem for Term Rewriting Systems; 1978; .
[45] Powell, W.B.; ; Approximate Dynamic Programming: Solving the Curses of Dimensionality: New York, NY, USA 2007; . · Zbl 1156.90021
[46] Voß, B.; Giegerich, R.; Rehmsmeier, M.; Complete probabilistic analysis of RNA shapes; BMC Biol.: 2006; Volume 4 ,5.
[47] Sedgewick, R.; ; Algorithms: Reading, MA, USA 2002; .
[48] Cormen, T.; Leiserson, C.; Rivest, R.; ; Introduction to Algorithms: Cambridge, MA, USA 1990; . · Zbl 1158.68538
[49] Braßel, B.; Hanus, M.; Peemöller, B.; Reck, F.; KiCS2: A new compiler from curry to haskell; Proceedings of the 20th International Workshop on Functional and (Constraint) Logic Programming: New York, NY, USA 2011; ,1-18.
[50] Thompson, S.; ; Haskell: The Craft of Functional Programming: Boston, MA, USA 2011; .
[51] Eisner, J.; Filardo, N.W.; ; Dyna: Extending Datalog for modern AI. Datalog 2.0: New York, NY, USA 2011; .
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.