×

Topological language for RNA. (English) Zbl 1352.92113

Summary: In this paper we introduce a novel, context-free grammar, \(\mathrm {RNAFeatures}^{*}\), capable of generating any RNA structure including pseudoknot structures (pk-structure). We represent pk-structures as orientable fatgraphs, which naturally leads to a filtration by their topological genus. Within this framework, RNA secondary structures correspond to pk-structures of genus zero. \(\mathrm {RNAFeatures}^{*}\) acts on formal, arc-labeled RNA secondary structures, called \(\lambda\)-structures. \(\lambda\)-structures correspond one-to-one to pk-structures together with some additional information. This information consists of the specific rearrangement of the backbone, by which a pk-structure can be made cross-free. \(\mathrm{RNAFeatures}^{*}\) is an extension of the grammar for secondary structures and employs an enhancement by labelings of the symbols as well as the production rules. We discuss how to use \(\mathrm {RNAFeatures}^{*}\) to obtain a stochastic context-free grammar for pk-structures, using data of RNA sequences and structures. The induced grammar facilitates fast Boltzmann sampling and statistical analysis. As a first application, we present an \(O(n \log(n))\) runtime algorithm which samples pk-structures based on ninety tRNA sequences and structures from the Nucleic Acid Database (NDB). Availability: the source code for simulation results is available at http://staff.vbi.vt.edu/fenixh/TPstructure.zip. The code is written in C and compiled by Xcode.

MSC:

92D20 Protein sequences, DNA sequences

Software:

Pfold; TPstructure; NNDB
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Uhlenbeck, O. C., A coat for all sequences, Nat. Struct. Biol., 5, 174-176 (1998)
[2] Stein, P. R.; Waterman, M. S., On some new sequences generalizing the catalan and motzkin numbers, Discrete Math., 26, 261-272 (1978) · Zbl 0405.10009
[3] Waterman, M. S.; Smith, T. F., Rapid dynamic programming methods for RNA secondary structure, Adv. Appl. Math., 7, 455-464 (1986) · Zbl 0607.92012
[4] Zuker, M.; Stiegler, P., Optimal computer folding of larger RNA sequences using thermodynamics and auxiliary information, Nucleic Acids Res., 9, 133-148 (1981)
[5] Hofacker, I. L.; Fontana, W.; Stadler, P. F.; Bonhoeffer, L. S.; Tacker, M.; Schuster, P., Fast folding and comparison of RNA secondary structures, Monatsh. Chem., 125, 167-188 (1994)
[6] McCaskill, J. S., The equilibrium partition function and base pair binding probabilities for RNA secondary structure, Biopolymers, 29, 1105-1119 (1990)
[7] Duchon, P.; Flajolet, P.; Louchard, G.; Schaeffer, G., Boltzmann samplers for the random generation of combinatorial structures, Comb. Probab. Comput., 13, 2004 (2004)
[8] Harrison, M. A., Introduction to Formal Language Theory (1978), Addison-Wesley · Zbl 0411.68058
[9] Nebel, M. E.; Scheid, A.; Weinberg, F., Random generation of RNA secondary structures according to native distributions, Algorithms Mol. Biol., 6, 24 (2011)
[10] Fontana, W.; Konings, D. A.M.; Stadler, P. F.; Schuster, P., Statistics of RNA secondary structures, Biopolymers, 33, 1389-1404 (1933)
[11] Rivas, E.; Eddy, S. R., A dynamic programming algorithm for RNA structure prediction including pseudoknots, J. Mol. Biol., 285, 2053-2068 (1999)
[12] Leontis, N. B.; Westhof, E., Geometric nomenclature and classification of RNA base pairs, RNA, 7, 499-512 (2001)
[13] Penner, R. C.; Knudsen, M.; Wiuf, C.; Andersen, J. E., Fatgraph models of proteins, Comm. Pure Appl. Math., 63, 1249-1297 (2010) · Zbl 1194.92028
[14] Chapuy, G., A new combinatorial identity for unicellular maps, via a direct bijective approach, Adv. Appl. Math., 47, 4, 874-893 (2011) · Zbl 1234.05037
[15] Massey, W. S., Algebraic Topology: An Introduction (1967), Springer-Veriag: Springer-Veriag New York · Zbl 0153.24901
[16] Smith, T.; Waterman, M., RNA secondary structure, Math. Biol., 42, 31-49 (1978) · Zbl 0402.92016
[17] Rivas, E.; Eddy, S. R., The language of RNA: a formal grammar that includes pseudoknots, Bioinformatics, 16, 334-340 (2000)
[18] Pachter, L.; Sturmfels, B., Algebraic Statistics for Computational Biology (2005), Cambridge University Press · Zbl 1108.62118
[19] Durbin, R.; Eddy, S. R.; Krogh, A.; Mitchison, G., Biological Sequence Analysis (1998), Cambridge University Press: Cambridge University Press Cambridge · Zbl 0929.92010
[20] Munch, K.; Krogh, A., Automatic generation of gene finders for eukaryotic species, BMC Bioinformatics, 7, 263 (2006)
[21] Pachter, L.; Alexandersson, M.; Cawley, S., Applications of generalized pair hidden Markov models to alignment and gene finding problems, J. Comput. Biol., 9, 2 (2002)
[22] Lottaz, C.; Iseli, C.; Jongeneel, C. V.; Bucher, P., Modeling sequencing errors by combining hidden Markov models, Bioinformatics, 19, ii (2003)
[23] Yoon, B. J.; Vaidyanathan, P. P., Structural alignment of RNAs using profile-csHMMs and its application to RNA homology search: overview and new results, IEEE Trans. Automat. Contr., 53, 10-25 (2008) · Zbl 1366.92099
[24] Waterman, M. S., Secondary structure of single-stranded nucleic acids, Adv. Math. (Suppl. Stud.), 1, 167-212 (1978) · Zbl 0434.05007
[25] Nussinov, R.; Piecznik, G.; Griggs, J. R.; Kleitman, D. J., Algorithms for loop matching, SIAM J. Appl. Math., 35, 1, 68-82 (1978) · Zbl 0411.92008
[26] Kleitman, D., Proportions of irreducible diagrams, Stud. Appl. Math., 49, 297-299 (1970) · Zbl 0198.03802
[27] Sankoff, D.; Leduc, G.; Antoine, N.; Paquin, B.; Lang, B. F.; R., C., Gene order comparisons for phylogenetic inference: evolution of the mitochondrial genome, Proc. Natl. Acad. Sci. USA, 89, 6575-6579 (1992)
[28] Knudsen, B.; Hein, J. J., Using stochastic context-free grammars and molecular evolution to predict RNA secondary structure, Bioinformatics, 15, 446-454 (1999)
[29] Westhof, E.; Jaeger, L., RNA pseudoknots, Curr. Opin. Struct. Biol., 2, 327-333 (1992)
[30] Chen, J. L.; Greider, C. W., Functional analysis of the pseudoknot structure in human telomerase RNA, Proc. Natl. Acad. Sci. USA, 102, 23 (2005)
[31] Loria, A.; Pan, T., Domain structure of the ribozyme from eubacterial ribonuclease, RNA, 2, 551-563 (1996)
[32] Staple, D. W.; Butcher, S. E., Pseudoknots: RNA structures with diverse functions, PLoS Biol., 3, e213 (2005)
[33] Konings, D.; Gutell, R., A comparison of thermodynamic foldings with comparatively derived structures of 16s and 16s-like rRNAs, RNA, 1, 559-574 (1995)
[34] Tuerk, C.; MacDougal, S.; Gold, L., RNA pseudoknots that inhibit human immunodeficiency virus type 1 reverse transcriptase, Proc. Natl. Acad. Sci. USA, 89, 15, 6988-6992 (1992)
[35] Penner, R. C.; Waterman, M. S., Spaces of RNA secondary structures, Adv. Math., 101, 31-49 (1993) · Zbl 0810.92011
[36] Penner, R. C., Cell decomposition and compactification of Riemann’s moduli space in decorated Teichmüller theory, (Tongring, N.; Penner, R. C. (2004), World Scientific: World Scientific Singapore), 263-301
[37] Orland, H.; Zee, A., RNA folding and large \(n\) matrix theory, Nucl. Phys. B, 620, 456-476 (2002) · Zbl 0991.92011
[38] Bon, M.; Vernizzi, G.; Orland, H.; Zee, A., Topological classification of RNA structures, J. Mol. Biol., 379, 900-911 (2008)
[39] Euler, L., Elementa doctrinae solidorum, Novi Comm. Acad. Sci. Imp. Petropol, 4, 109-140 (1752)
[40] Andersen, J. E.; Penner, R. C.; Reidys, C. M.; Waterman, M. S., Topological classification and enumeration of RNA structrues by genus, J. Math. Biol., 67, 5, 1261-1278 (2013) · Zbl 1335.92068
[41] Reidys, C. M.; Huang, F.; Andersen, J. E.; Penner, R. C.; Stadler, P. F.; Nebel, M. E., Topology and prediction of RNA pseudoknots, Bioinformatics, 27, 1076-1085 (2011)
[42] Loebl, M.; Moffatt, I., The chromatic polynomial of fatgraphs and its categorification, Adv. Math., 217, 1558-1587 (2008) · Zbl 1131.05036
[43] Hatcher, A., Algebraic Topology (2002), Cambridge University Press · Zbl 1044.55001
[44] Giegerich, R., Introduction to stochastic context free grammars, (Gorodkin, J.; Ruzzo, W. L., RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods, 1097 (2014), Humana Press), 85-126
[45] Mathews, D.; Sabina, J.; Zuker, M.; Turner, D., Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure, J. Mol. Biol., 288, 911-940 (1999)
[46] Mathews, D.; Disney, M.; Childs, J.; Schroeder, S.; Zuker, M.; Turner, D., Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure, Proc. Natl. Acad. Sci. USA, 101, 7287-7292 (2004)
[47] Turner, D.; Mathews, D. H., NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure, Nucl. Acids Res., 38, D280-D282 (2010)
[48] Eddy, S. R.; Durbin, R., RNA sequence analysis using covariance models, Nucl. Acids Res., 22, 2079-2088 (1994)
[49] Sakakibara, Y.; Brown, M.; Hughey, R.; Mian, I. S.; Underwood, D.; Haussler, R. C.; Sjöolander, K., Stochastic context-free grammars for tRNA modeling, Nucl. Acids Res., 22, 5112-5120 (1994)
[50] Knudsen, B.; Hein, J. J., Pfold: RNA secondary structure prediction using stochastic context-free grammars, Nucl. Acids Res., 31, 3423-3428 (2003)
[51] Rivas, E.; Eddy, S. R., Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs, Bioinformatics, 16, 7, 583-605 (2000)
[52] Rivas, E.; Eddy, S. R., Noncoding RNA gene detection using comparative sequence analysis, BMC Bioinformatics, 2, 8 (2001)
[53] Ponty, Y., Efficient sampling of RNA secondary structures from the Boltzmann ensemble of low-energy: the boustrophedon method, J. Math. Biol., 56, 1-2 (2008), 107-27 · Zbl 1144.92016
[54] Dirks, R. M.; Pierce, N. A., A partition function algorithm for nucleic acid secondary structure including pseudoknots, J. Comput. Chem., 24, 1664-1677 (2003)
[55] Berman, H. M.; Olson, W. K.; Beveridge, D. L.; Westbrook, J.; Gelbin, A.; Demeny, T.; Hsieh, A. R.; Schneider, B.; Srinivasan, S. H., The nucleic acid database: a comprehensive relational database of three-dimensional structures of nucleic acids, Biophys. J., 63, 751-759 (1992)
[56] Narayanan, B. C.; Westbrook, J.; Ghosh, S.; Petrov, A. I.; Sweeney, B.; Zirbel, C. L.; Leontis, N. B.; Berman, H. M., The nucleic acid database: new features and capabilities, Nucleic Acids Res., 42 (2013), D114-22
[57] Huang, F.; Nebel, M. E.; Reidys, C. M., Generation of RNA pseudoknot structures with topological genus filtration, Math Biosci., 245, 2, 216-225 (2013) · Zbl 1308.92078
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.