×

DARN! A weighted constraint solver for RNA motif localization. (English) Zbl 1142.92015

Summary: Following recent discoveries about the important roles of non-coding RNAs (ncRNAs) in the cellular machinery, there is now great interest in identifying new occurrences of ncRNAs in available genomic sequences. We show how the problem of finding new occurrences of characterized ncRNAs can be modeled as the problem of finding all locally-optimal solutions of a weighted constraint network using dedicated weighted global constraints, encapsulating pattern-matching algorithms and data structures. This is embodied in DARN!, a software tool for ncRNA localization, which, compared to existing pattern-matching based tools, offers additional expressivity (such as enabling RNA-RNA interactions to be described) and improved specificity (through the exploitation of scores and local optimality) without compromises in CPU efficiency. This is demonstrated on the actual search for tRNAs and H/ACA sRNA on different genomes.

MSC:

92C40 Biochemistry, molecular biology
92-08 Computational methods for problems pertaining to biology
68W05 Nonnumerical algorithms
92C37 Cell biology

Keywords:

ncRNA
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Abouelhoda, M., Kurtz, S., & Ohlebusch E. (2004). Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms, 2, 53–86. · Zbl 1115.92303 · doi:10.1016/S1570-8667(03)00065-0
[2] Altschul, S., Gish, W., Miller, W., Myers, E., & Lipman D. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215, 403–410.
[3] Billoud, B., Kontic, M., & Viari, A. (1996). Palingol: A declarative programming language to describe nucleic acids’ secondary structures and to scan sequence database. Nucleic Acids Research, 24(8), 1395–1403. · doi:10.1093/nar/24.8.1395
[4] Cherry, J., Ball, C., Weng, S., Juvik, G., Schmidt, R., Adler, C., et al. (1997). Genetic and physical maps of Saccharomyces cerevisiae. Nature, 387(6632 Suppl), 67–73. · doi:10.1038/43025
[5] Dsouza, M., Larsen, N., & Overbeek, R. (1997). Searching for patterns in genomic data. Trends in Genetics, 13(12).
[6] Eddy, S. (1996). Rnabob: A program to search for RNA secondary structure motifs in sequence databases. http://bioweb.pasteur.fr/docs/man/man/rnabob.1.html#toc1
[7] Eddy, S., & Durbin, R. (1994). RNA sequence analysis using covariance models. Nucleic Acids Research, 22(11), 2079–2088. · doi:10.1093/nar/22.11.2079
[8] Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S., & Bateman, A. (2005). Rfam: Annotating non-coding RNAs in complete genomes. Nucleic Acids Research, 33, D121–D124. · Zbl 05437549 · doi:10.1093/nar/gki081
[9] Laferrière, A., Gautheret, D., & Cedergren R. (1994). An RNA pattern matching program with enhanced performance and portability. Computer Applications in Biosciences, 10(2), 211–212.
[10] Larrosa, J., & Schiex, T. (2004). Solving weighted CSP by maintaining arc-consistency. Artificial Intelligence, 159(1–2), 1–26. · Zbl 1086.68592 · doi:10.1016/j.artint.2004.05.004
[11] Lhomme, O. (1993). Consistency techniques for numeric CSPs. In IJCAI-93 (pp. 232–238).
[12] Macke, T., Ecker, D., Gutell, R., Gautheret, D., Case, D., & Sampath, R. (2001). Rnamotif, an RNA secondary structure definition and search algorithm. Nucleic Acids Research, 29(22), 4724–4735. · doi:10.1093/nar/29.22.4724
[13] Navarro, G., & Raffinot, M. (2002). Flexible pattern matching in strings–practical on-line search algorithms for texts and biological sequences. Cambridge University Press. · Zbl 0992.92029
[14] Needleman, S., & Wunsch, C. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3), 443–453. · doi:10.1016/0022-2836(70)90057-4
[15] Sakakibara, Y., Brown, M., Hughey, R., Mian, I., Sjölander, K., Underwood, R., et al. (1994). Recent methods for RNA modeling using stochastic context-free grammars. In CPM’94 (pp. 289–306).
[16] Thébault, P., de Givry, S., Schiex, T., & Gaspin, C. (2006). Searching RNA motifs and their intermolecular contacts with constraint networks. Bioinformatics, 22(17), 2074–2080. · doi:10.1093/bioinformatics/btl354
[17] Vialette, S. (2004). On the computational complexity of 2-interval pattern matching problems. Theoretical Computer Science, 312(2–3), 223–249. · Zbl 1070.68052 · doi:10.1016/j.tcs.2003.08.010
[18] Zytnicki, M., Gaspin, C., & Schiex, T. (2006). A new local consistency for weighted CSP dedicated to long domains. In SAC’06: Proceedings of the 2006 ACM symposium on applied computing (pp. 394–398).
[19] Zytnicki, M., Gaspin, C., & Schiex, T. (2006). Suffix arrays and weighted CSPs. In A. Dal Palu, A. Dovier, & S. Will (Eds.), Workshop on constraint based methods for bioinformatics (pp. 69–74). Nantes.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.