×

zbMATH — the first resource for mathematics

New syntax to describe local continuous structure-sequence information for recognizing new pre-miRNAs. (English) Zbl 1406.92473
Summary: As an important complement to experimental identification of pre-miRNA, computational prediction methods are attracting more and more attention. Features extracted from pre-miRNA are the key to computational prediction. Among the features, local continuous structure-sequence information is usually employed by existing computational methods. As more and more species-specific miRNAs have been identified, a new syntax is required to describe pre-miRNA local continuous structure-sequence features. Therefore, we proposed here the use of couplet syntax to describe pre-miRNA intrinsic features. When tested on a dataset from miRBase12.0 with the use of features extracted by couplet syntax, the SVM classifier achieves a sensitivity of 81.98% and specificity of 87.16% on a human test set and a sensitivity of 86.71% on all other species. The obtained results indicate that the proposed couplet syntax can describe the intrinsic features of pre-miRNA better than traditional methods. By means of describing pre-miRNA secondary structure more precisely and masking frequently mutated nucleotides, couplet syntax provides a powerful feature-describing method that can be applied to many computational prediction methods.
MSC:
92D20 Protein sequences, DNA sequences
68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Ambros, V., The functions of animal mircornas, Nature, 431, 350-355, (2004)
[2] Ambros, V.; Bartel, B.; Bartel, D.P.; Burge, C.B.; Carrington, J.C.; Chen, X.; Dreyfuss, G.; Eddy, S.R.; Griffiths-Jones, S.; Marshall, M.; Matzke, M.; Ruvkun, G.; Tuschl, T., A uniform system for microrna annotation, RNA, 9, 277-279, (2003)
[3] Berezikov, E.; Guryev, V.; van de Belt, J.; Wienholds, E.; Plasterk, R.; Cuppen, E., Phylogenetic shadowing and computational identification of human microrna genes, Cell, 120, 21-24, (2005)
[4] Carrington, J.C.; Ambros, V., Role of micrornas in plant and animal development, Science, 301, 336-338, (2003)
[5] Fan, R.-E.; Chen, P.-H.; Lin, C.-J., Working set selection using the second order information for training SVM, J. Mach. learning res., 6, 1889-1918, (2005) · Zbl 1222.68198
[6] Griffiths-Jones, S., The microrna registry, Nar, 32, D109-D111, (2004)
[7] Griffiths-Jones, S.; Grocock, R.J.; van Dongen, S.; Bateman, A.; Enright, A.J., Mirbase:microrna sequences, targets and gene nomenclature, Nar, 34, D140-D144, (2006)
[8] Griffiths-Jones, S.; Saini, H.K.; van Dongen, S.; Enright, A.J., Mirbase:tools for microrna genomics, Nar, 36, D154-D158, (2008)
[9] Helvik, S.A., Jr., O.S., Saetrom, P., 2007. Reliable prediction of Drosha processing sites improves microRNA gene prediction. Bioinformatics 23, 142-149.
[10] Huang, T.H.; Fan, B.; Rothschild, M.F.; Hu, Z.L.; Li, K.; Zhao, S.H., Mirfinder:an improved approach and software implementation for genome-wide fast microrna precursors scans, BMC bioinformatics, 8, 341, (2007)
[11] Jiang, P.; Wu, H.; Wang, W.; Ma, W.; Sun, X.; Lu, Z., Mipred: classification of real and pseudo microrna precursors using random forest prediction model with combined features, Nucleic acids res., 35, W339-W344, (2007)
[12] Ng, K.L.S; Mishra, S., De novo SVM classification of precursor micrornas from genomic pseudo hairpins using global and intrinsic folding measures, Bioinformatics, 23, 1321-1330, (2007)
[13] Lai, E.C.; Tomancak, P.; Williams, R.W.; Rubin, G.M., Computational identification of drosophila microrna genes, Genome biol., 4, R42, (2003)
[14] Lee, R.C.; Ambros, V., An extensive class of small RNAs in caenorhabditis elegans, Science, 294, 862-864, (2001)
[15] Lim, L.P.; Glasner, M.E.; Yekta, S.; Burge, C.B.; Bratel, D.P., Vertebrate microrna genes, Science, 299, 1540, (2003)
[16] Lim, L.P.; Lau, N.C.; Weinstein, E.G.; Yekta, S.; Rhoades, M.W.; Burge, C.B.; Bartel, D.P., The micrornas of canenorhabditis elegans, Genes dev., 17, 991-1008, (2003)
[17] Mathews, D.H.; Sabina, J.; Zuker, M.; Turner, D.H., Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure, J. mol. biol., 288, 911-940, (1999)
[18] Pfeffer, S.; Sewer, A.; Lagos-Quintana, M., Identification of micrornas of the herpesvirus family, Nat. methods, 2, 269-276, (2005)
[19] Song, Xiao-feng; Chen, Wei-min; Chen, Yi-Ping Phoebe; Jiang, Bin, Candidate working set strategy based SMO algorithm in support vector machine, Inform. process. manage., 45, 5, 584-592, (2009)
[20] Wang, X.W.; Zhang, J.; Li, F.; Gu, J.; He, T.; Zhang, X.G.; Li, Y.D., Microrna identification based on sequence and structure alignment, Bioinformatics, 21, 3610-3614, (2005)
[21] Xu, Y.; Zhou, X.; Zhang, W., Microrna prediction with a novel ranking algorithm based on random walks, Bioinfomatics, 24, i50-i58, (2008)
[22] Xue, C.H.; Li, F.; He, T.; Liu, G.P.; Li, Y.D.; Zhang, X.G., Classification of real and pseudo microrna precursors using local structure-sequence features and support vector machine, BMC bioinformatics, 6, 310, (2005)
[23] Yousef, Malik; Nebozhyn, Michael; Shatkay, Hagit, Combining multi-species genomic data for microrna identification using a naive Bayes classifier, Bioinformatics, 22, 1325-1334, (2006)
[24] Zeng, Y.; Cullen, B.R., Structural requirments for pre-microrna binding and nuclear export by exportin 5, Nucleic acids res., 32, 4776-4785, (2004)
[25] Zhang, H.; Kolb, F.A.; Jaskiewicz, L.; Westhof, E.; Filipowicz, W., Single processing center models for human dicer and bacterial rnase III, Cell, 118, 57-68, (2004)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.