zbMATH — the first resource for mathematics

Predicting DNA- and RNA-binding proteins from sequences with kernel methods. (English) Zbl 1402.92332
Summary: In this paper, support vector machines (SVMs) are applied to predict the nucleic-acid-binding proteins. We constructed two classifiers to differentiate DNA/RNA-binding proteins from non-nucleic-acid-binding proteins by using a conjoint triad feature which extract information directly from amino acids sequence of protein. Both self-consistency and jackknife tests show promising results on the protein datasets in which the sequences identity is less than 25%. In the self-consistency test, the predictive accuracy is 90.37% for DNA-binding proteins and 89.70% for RNA-binding proteins. In the jackknife test, the predictive accuracies are 78.93% and 76.75%, respectively. Comparison results show that our method is very competitive by outperforming other previously published sequence-based prediction methods.
Reviewer: Reviewer (Berlin)

92D20 Protein sequences, DNA sequences
68T05 Learning and adaptive systems in artificial intelligence
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI
[1] Ahmad, S.; Sarai, A., Moment-based prediction of DNA-binding proteins, J. Mol. Biol., 341, 65-71, (2004)
[2] Baldi, P.; Brunak, S.; Chauvin, Y.; Andersen, C. A.F.; Nielsen, H., Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics, 16, 412-424, (2000)
[3] Bhardwaj, N.; Langlois, R. E.; Zhao, G. J.; Lu, H., Kernel-based machine learning protocol for predicting DNA-binding proteins, Nucleic Acids Res., 33, 20, 6486-6493, (2005)
[4] Boeckmann, B.; Bairoch, A.; Apweiler, R., The SWISS-PROT protein knowledgebase and its supplement trembl in 2003, Nucleic Acids Res., 31, 1, 365-370, (2003)
[5] Cai, Y. D.; Lin, S. L., Support vector machines for predicting rrna-, RNA-, and DNA-binding proteins from amino acid sequence, Biochim. et Biophys. Acta, 1648, 127-133, (2003)
[6] Chang, C.C., Lin, C.J., 2003. LIBSVM: a library for support vector machines. Available from: \(\langle\)http://www.csie.ntu.edu.tw/cjlin/papers/libsvm.pdf\(\rangle\).
[7] Chen, C.; Chen, L. X.; Zou, X. Y.; Cai, P. X., Predicting protein structural class based on multi-features fusion, J. Theor. Biol., 253, 388-392, (2008) · Zbl 1398.92196
[8] Chen, K.; Kurgan, L. A.; Ruan, J., Prediction of protein structural class using novel evolutionary collocation-based sequence representation, J. Comput. Chem., 29, 1596-1604, (2008)
[9] Chen, Y. L.; Li, Q. Z., Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition, J. Theor. Biol., 248, 377-381, (2007)
[10] Chen, Y. L.; Li, Q. Z., Prediction of the subcellular location of apoptosis proteins, J. Theor. Biol., 245, 775-783, (2007)
[11] Chou, K. C., Review: prediction of HIV protease cleavage sites in proteins, Anal. Biochem., 233, 1-14, (1996)
[12] Chou, K. C., Review: structural bioinformatics and its impact to biomedical science, Curr. Med. Chem., 11, 2105-2134, (2004)
[13] Chou, K. C.; Shen, H. B., Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides, Biochem. Biophys. Res. Comm., 357, 633-640, (2007)
[14] Chou, K. C.; Shen, H. B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochem. Biophys. Res. Comm., 360, 339-345, (2007)
[15] Chou, K. C.; Shen, H. B., Review: recent progresses in protein subcellular location prediction, Anal. Biochem., 370, 1-16, (2007)
[16] Chou, K. C.; Shen, H. B., Cell-ploc: a package of web-servers for predicting subcellular localization of proteins in various organisms, Nat. Protocols, 3, 153-162, (2008)
[17] Chou, K. C.; Shen, H. B., Protident: a web server for identifying proteases and their types by fusing functional domain and sequential evolution information, Biochem. Biophys. Res. Comm., 376, 321-325, (2008), doi:10.1016/j.bbrc.2008.08.125
[18] Chou, K. C.; Zhang, C. T., Review: prediction of protein structural classes, Crit. Rev. Biochem. Mol. Biol., 30, 275-349, (1995)
[19] Chou, K. C.; Wei, D. Q.; Zhong, W. Z., Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS. (erratum: chou, K.C., wei, D.Q., zhong, W.Z., 2003. binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS, vol. 310, 675), Biochem. Biophys. Res. Comm., 308, 148-151, (2003)
[20] Du, P.; Li, Y., Prediction of C-to-U RNA editing sites in plant mitochondria using both biochemical and evolutionary information, J. Theor. Biol., 253, 579-589, (2008)
[21] Du, Q. S.; Huang, R. B.; Wei, Y. T.; Du, L. Q.; Chou, K. C., Multiple field three dimensional quantitative structure-activity relationship (MF-3D-QSAR), J. Comput. Chem., 29, 211-219, (2008)
[22] Du, Q. S.; Mezey, P. G.; Chou, K. C., Heuristic molecular lipophilicity potential (HMLP): a 2D-QSAR study to LADH of molecular family pyrazole and derivatives, J. Comput. Chem., 26, 461-470, (2005)
[23] Han, L. Y.; Cai, C. Z.; Lo, S. L.; Chung, M. C.M.; Chen, Y. Z., Prediction of RNA-binding proteins from primary sequence by a support vector machine approach, RNA, 10, 355-368, (2004)
[24] Jiang, X.; Wei, R.; Zhang, T. L.; Gu, Q., Using the concept of Chou’s pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy, Protein Pept. Lett., 15, 392-396, (2008)
[25] Jin, Y.; Niu, B.; Feng, K. Y.; Lu, W. C.; Cai, Y. D.; Li, G. Z., Predicting subcellular localization with adaboost learner, Protein Pept. Lett., 15, 286-289, (2008)
[26] Kumar, M.; Gromiha, M. M.; Raghava, G. P.S., Identification of DNA-binding proteins using support vector machines and evolutionary profiles, BMC Bioinformatics, 8, 463, (2007), doi:10.1186/1471-2105-8-463
[27] Li, F. M.; Li, Q. Z., Predicting protein subcellular location using pseudo amino acid composition and improved hybrid approach, Protein Pept. Lett., 15, 612-616, (2008)
[28] Li, Y.; Wei, D. Q.; Gao, W. N.; Gao, H.; Liu, B. N.; Huang, C. J.; Xu, W. R.; Liu, D. K.; Chen, H. F.; Chou, K. C., Computational approach to drug design for oxazolidinones as antibacterial agents, Med. Chem., 3, 576-582, (2007)
[29] Li, Z. R.; Lin, H. H.; Han, L. Y.; Jiang, L.; Chen, X.; Chen, Y. Z., PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence, Nucleic Acids Res., 34, web server issue, W32-7, (2006)
[30] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using pseudo amino acid composition, J. Theor. Biol., 252, 350-356, (2008) · Zbl 1398.92076
[31] Lin, H.; Ding, H.; Feng-Biao Guo, F. B.; Zhang, A. Y.; Huang, J., Predicting subcellular localization of mycobacterial proteins by using pseudo amino acid composition, Protein Pept. Lett., 15, 739-744, (2008)
[32] Luscombe, N. M.; Thornton, J. M., Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity, J. Mol. Biol., 320, 5, 991-1009, (2002)
[33] Niu, B.; Jin, Y. H.; Feng, K. Y.; Liu, L.; Lu, W. C.; Cai, Y. D.; Li, G. Z., Predicting membrane protein types with bagging learner, Protein Pept. Lett., 15, 590-594, (2008)
[34] Prado-Prado, F. J.; Gonzalez-Diaz, H.; de la Vega, O. M.; Ubeira, F. M.; Chou, K. C., Unified QSAR approach to antimicrobials. part 3: first multi-tasking QSAR model for input-coded prediction, structural back-projection, and complex networks clustering of antiprotozoal compounds, Bioorg. Med. Chem., 16, 5871-5880, (2008)
[35] Schölkopf, B.; Tsuda, K.; Vert, J. P., Kernel methods in computational biology, (2004), MIT Press Cambridge, MA
[36] Shen, H. B.; Chou, K. C., Signal-3L: a 3-layer approach for predicting signal peptide, Biochem. Biophys. Res. Comm., 363, 297-303, (2007)
[37] Shen, H. B.; Chou, K. C., Ezypred: a top-down approach for predicting enzyme functional classes and subclasses, Biochem. Biophys. Res. Comm., 364, 53-59, (2007)
[38] Shen, H. B.; Chou, K. C., Hivcleave: a web-server for predicting HIV protease cleavage sites in proteins, Anal. Biochem., 375, 388-390, (2008)
[39] Shen, J. W.; Zhang, J.; Luo, X. M.; Zhu, W. L.; Yu, K. Q.; Chen, K. X.; Li, Y. X.; Jiang, H. L., Predicting protein – protein interactions based only on sequences information, Proc. Nat. Acad. Sci. USA, 104, 11, 4337-4341, (2007)
[40] Shi, M. G.; Huang, D. S.; Li, X. L., A protein interaction network analysis for yeast integral membrane protein, Protein Pept. Lett., 15, 692-699, (2008)
[41] Siomi, H.; Dreyfuss, G., RNA-binding proteins as regulators of gene expression, Curr. Opinion Genet. Dev., 7, 3, 345-353, (1997)
[42] Sirois, S.; Wei, D. Q.; Du, Q. S.; Chou, K. C., Virtual screening for SARS-cov protease based on KZ7088 pharmacophore points, J. Chem. Inf. Comput. Sci., 44, 1111-1122, (2004)
[43] Vapnik, V., The nature of statistical learning theory, (1995), Springer New York · Zbl 0833.62008
[44] Wang, J. F.; Wei, D. Q.; Chen, C.; Li, Y.; Chou, K. C., Molecular modeling of two CYP2C19 SNPs and its implications for personalized drug design, Protein Pept. Lett., 15, 27-32, (2008)
[45] Wu, G.; Yan, S., Prediction of mutations in H3N2 hemagglutinins of influenza A virus from north America based on different datasets, Protein Pept. Lett., 15, 144-152, (2008)
[46] Yu, X. J.; Cao, J. P.; Cai, Y. D.; Shi, T. L.; Li, Y. X., Predicting rrna-, RNA-, and DNA-binding proteins from primary structure with support vector machines, J. Theor. Biol., 240, 175-184, (2006), doi:10.1016/j.jtbi.2005.09.018
[47] Zhang, G. Y.; Fang, B. S., Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou’s amphiphilic pseudo amino acid composition, J. Theor. Biol., 253, 310-315, (2008)
[48] Zhou, Q.; Liu, J. S., Extracting sequence features to predict protein-DNA interactions: a comparative study, Nucleic Acid Res., 36, 12, 4137-4148, (2008)
[49] Zhou, X. B.; Chen, C.; Li, Z. C.; Zou, X. Y., Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. Theor. Biol., 248, 546-551, (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.