×

zbMATH — the first resource for mathematics

Improving RNA secondary structure prediction via state inference with deep recurrent neural networks. (English) Zbl 1439.92147
Summary: The problem of determining which nucleotides of an RNA sequence are paired or unpaired in the secondary structure of an RNA, which we call RNA state inference, can be studied by different machine learning techniques. Successful state inference of RNA sequences can be used to generate auxiliary information for data-directed RNA secondary structure prediction. Typical tools for state inference, such as hidden Markov models, exhibit poor performance in RNA state inference, owing in part to their inability to recognize nonlocal dependencies. Bidirectional long short-term memory (LSTM) neural networks have emerged as a powerful tool that can model global nonlinear sequence dependencies and have achieved state-of-the-art performances on many different classification problems. This paper presents a practical approach to RNA secondary structure inference centered around a deep learning method for state inference. State predictions from a deep bidirectional LSTM are used to generate synthetic SHAPE data that can be incorporated into RNA secondary structure prediction via the nearest neighbor thermodynamic model (NNTM). This method produces predicted secondary structures for a diverse test set of 16S ribosomal RNA that are, on average, 25 percentage points more accurate than undirected MFE structures. Accuracy is highly dependent on the success of our state inference method, and investigating the global features of our state predictions reveals that accuracy of both our state inference and structure inference methods are highly dependent on the similarity of pairing patterns of the sequence to the training dataset. Availability of a large training dataset is critical to the success of this approach. Code available at https://github.com/dwillmott/rna-state-inf.
MSC:
92D20 Protein sequences, DNA sequences
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Jamie J Cannone, Sankar Subramanian, Murray N Schnare, James R Collett, Lisa M D’Souza, Yushi Du, Brian Feng, Nan Lin, Lakshmi V Madabusi, Kirsten M Müller, et al. The comparative rna web (crw) site: an online database of comparative sequence and structure information for ribosomal, intron, and other rnas. BMC bioinformatics, 3(1):2, 2002.
[2] Jamie J. Cannone, Sankar Subramanian, Murray N. Schnare, James R. Collett, Lisa M. D’Souza, Yushi Du, Brian Feng, Nan Lin, Lakshmi V. Madabusi, Kirsten M. Müller, Nupur Pande, Zhidi Shang, Nan Yu, and Robin R. Gutell. The comparative rna web (crw) site: an online database of comparative sequence and structure information for ribosomal, intron, and other rnas. BMC Bioinformatics, 3(1):2, 2002.
[3] Jonathan L Chen, Stanislav Bellaousov, and Douglas H Turner. Rna secondary structure determination by nmr. Methods Mol Biol, 1490:177-86, 2016.
[4] François Chollet et al. Keras, 2015.
[5] Katherine E Deigan, Tian W Li, David H Mathews, and Kevin M Weeks. Accurate shape-directed rna structure determination. Proc Natl Acad Sci U S A, 106(1):97-102, Jan 2009.
[6] Laura DiChiacchio, Michael F Sloma, and David H Mathews. Accessfold: predicting rna-rna interactions with consideration for competing self-structure. Bioinformatics, 32(7):1033-1039, 2015.
[7] Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998. · Zbl 0929.92010
[8] Sean R Eddy. Computational analysis of conserved rna secondary structure in transcriptomes and genomes. Annu Rev Biophys, 43:433-56, 2014.
[9] Boris Fürtig, Christian Richter, Jens Wöhnert, and Harald Schwalbe. Nmr spectroscopy of rna. ChemBioChem, 4(10):936-962, 2003.
[10] Paul P Gardner and Robert Giegerich. A comprehensive comparison of comparative rna structure prediction approaches. BMC Bioinformatics, 5:140, Sep 2004.
[11] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. · Zbl 1373.68009
[12] Alex Graves and Jürgen Schmidhuber. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18(5):602-610, 2005.
[13] Robin R Gutell, Jung C Lee, and Jamie J Cannone. The accuracy of ribosomal rna comparative structure models. Curr Opin Struct Biol, 12(3):301-10, Jun 2002.
[14] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735-1780, November 1997.
[15] Risa Kawaguchi, Hisanori Kiryu, Junichi Iwakiri, and Jun Sese. reactidr: evaluation of the statistical reproducibility of high-throughput structural analyses towards a robust rna structure prediction. BMC Bioinformatics, 20(Suppl 3):130, Mar 2019.
[16] Wan-Jung C Lai, Mohammad Kayedkhordeh, Erica V Cornell, Elie Farah, Stanislav Bellaousov, Robert Rietmeijer, Enea Salsi, David H Mathews, and Dmitri N Ermolenko. mrnas and lncrnas intrinsically form secondary structures with short end-to-end distances. Nat Commun, 9(1):4328, 10 2018.
[17] D M Layton and R Bundschuh. A statistical analysis of rna folding algorithms through thermodynamic parameter perturbation. Nucleic Acids Res, 33(2):519-24, 2005.
[18] S Y Le, J H Chen, and J V Maizel, Jr. Prediction of alternative rna secondary structures based on fluctuating thermodynamic parameters. Nucleic Acids Res, 21(9):2173-8, May 1993.
[19] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
[20] Mirko Ledda and Sharon Aviran. Patterna: transcriptome-wide search for functional rna elements via structural data signatures. Genome Biology, 19(1), Mar 2018.
[21] Thomas J X Li and Christian M Reidys. The rainbow spectrum of rna secondary structures. Bull Math Biol, 80(6):1514-1538, 06 2018. · Zbl 1394.92098
[22] Ronny Lorenz, Stephan H Bernhart, Christian Höner Zu Siederdissen, Hakim Tafer, Christoph Flamm, Peter F Stadler, and Ivo L Hofacker. Viennarna package 2.0. Algorithms Mol Biol, 6:26, Nov 2011.
[23] Nicholas R Markham and Michael Zuker. Unafold: software for nucleic acid folding and hybridization. Methods Mol Biol, 453:3-31, 2008.
[24] David H Mathews and Douglas H Turner. Prediction of rna secondary structure by free energy minimization. Curr Opin Struct Biol, 16(3):270-8, Jun 2006.
[25] Jessica S Reuter and David H Mathews. Rnastructure: software for rna secondary structure prediction and analysis. BMC Bioinformatics, 11:129, 2010.
[26] Emily Rogers, David Murrugarra, and Christine Heitsch. Conditioning and robustness of rna boltzmann sampling under thermodynamic parameter perturbations. Biophysical Journal, 113(2):321-329, 2017.
[27] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors. Cognitive modeling, 5(3):1, 1988. · Zbl 1369.68284
[28] Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673-2681, 1997.
[29] Alina Selega, Christel Sirocchi, Ira Iosub, Sander Granneman, and Guido Sanguinetti. Robust statistical modeling improves sensitivity of high-throughput rna structure probing experiments. Nat Methods, 14(1):83-89, 01 2017.
[30] Zsuzsanna Sükösd, M Shel Swenson, Jørgen Kjems, and Christine E Heitsch. Evaluating the accuracy of shape-directed rna secondary structure predictions. Nucleic Acids Res, 41(5):2807-16, Mar 2013.
[31] M Shel Swenson, Joshua Anderson, Andrew Ash, Prashant Gaurav, Zsuzsanna Sukosd, David A Bader, Stephen C Harvey, and Christine E Heitsch. Gtfold: Enabling parallel rna secondary structure prediction on multi-core desktops. BMC Res Notes, 5(1):341, Jul 2012.
[32] Hakim Tafer, Fabian Amman, Florian Eggenhofer, Peter F Stadler, and Ivo L Hofacker. Fast accessibility-based prediction of rna-rna interactions. Bioinformatics, 27(14):1934-40, Jul 2011.
[33] Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs/1605.02688, May 2016.
[34] Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 4(2), 2012.
[35] Douglas H Turner and David H Mathews. Nndb: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res, 38(Database issue):D280-2, Jan 2010.
[36] Stefan Washietl, Ivo L Hofacker, Peter F Stadler, and Manolis Kellis. Rna folding with soft constraints: reconciliation of probing data and thermodynamic secondary structure prediction. Nucleic Acids Res, 40(10):4261-72, May 2012.
[37] Kevin A Wilkinson, Robert J Gorelick, Suzy M Vasa, Nicolas Guex, Alan Rein, David H Mathews, Morgan C Giddings, and Kevin M Weeks. High-throughput shape analysis reveals structures in hiv-1 genomic rna strongly conserved across distinct biological states. PLoS Biol, 6(4):e96, Apr 2008.
[38] Devin Willmott. Recurrent Neural Networks and Their Application to RNA Secondary Structure Inference. PhD thesis, University of Kentucky, 8 2018.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.