×

Predicting the state of cysteines based on sequence information. (English) Zbl 1410.92085

Summary: A three-stage support vector machine (SVM) was constructed to predict the state of cysteines by fusing sequence information, evolution information and annotation information of protein sequences. The first and second stages were for predicting whether the protein sequences contain disulfide bonds and whether all of the cysteines are involved in disulfide bonds. In the last stage, one SVM was constructed for predicting which cysteines are involved in disulfide bonds, among all these cysteines in proteins. The three SVMs give a good performance and the overall prediction accuracy are 90.05%, 96.36% and 80.00%, respectively, which indicates that the features selected in this work are effective for predicting the state of cysteines. In addition, current methods only paid too much attention to the prediction performance and never showed us how much important the roles of these features played in the prediction. As a result a feature importance measurement designated as \(F\)-score function was used to evaluate these features. The result shows that among these protein descriptors; evolution information is the most important feature for representing the disulfide-containing proteins. The prediction software and data sets used in this article are freely available at http://cic.scu.edu.cn/bioinformatics/Predict_Cys.zip.

MSC:

92D20 Protein sequences, DNA sequences
68T05 Learning and adaptive systems in artificial intelligence
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ceroni, A.; Frasconi, P.; Passerini, A.; Vullo, A., Predicting the disulfide bonding state of cysteines with combinations of kernel machines, J. VLSI. Signal Process. Syst., 35, 287-295 (2003) · Zbl 1042.68647
[2] Chen, Y. C.; Lin, S. C.; Lin, C. J.; Hwang, J. K., Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences, Proteins, 55, 1036-1042 (2004)
[3] Chen, Y. W.; Lin, C. J., Combining SVMs with various feature selection strategies. (2006), Springer
[4] Cheng, C. W.; Su, E. C.Y.; Hwang, J. K.; Sung, T. Y.; Hsu, W. L., Predicting RNA-binding sites of proteins using support vector machines and evolutionary information, BMC Bioinformatics, 9, S6 (2008)
[5] Chou, K. C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43, 246-255 (2001)
[6] Chou, K. C.; Cai, Y. D., Using functional domain composition and support vector machines for prediction of protein subcellular location, J. Biol. Chem., 277, 45765-45769 (2002)
[7] Chou, K. C.; Shen, H. B., Predicting protein subcellular location by fusing multiple classifiers, J. Cell. Biochem., 99, 517-527 (2006)
[8] Chou, K. C.; Shen, H. B., Recent progress in protein subcellular location prediction, Anal. Biochem., 370, 1-16 (2007)
[9] Creighton, T., 1993. Proteins: Structures and Molecular Properties, New York.; Creighton, T., 1993. Proteins: Structures and Molecular Properties, New York.
[10] Creighton, T., Disulfide-coupled protein-folding pathways, Phil. Trans. R. Soc. London B Biol. Sci., 348, 5-10 (1995)
[11] Fan, R. E.; Chen, P. H.; Lin, C. J., Working set selection using order information for training SVM, J. Mach. Learn. Res., 6, 1889-1918 (2005) · Zbl 1222.68198
[12] Fan, R. E.; Chang, K. W.; Hsieh, C. J.; Wang, X. R.; Lin, C. J., Liblinear: a library for large linear classification, J. Mach. Learn. Res., 9, 1871-1874 (2008) · Zbl 1225.68175
[13] Fariselli, P.; Riccobelli, P.; Casadio, R., Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins, Proteins, 36, 340-346 (1999)
[14] Ferre, F.; Clote, P., Disulfide connectivity prediction using secondary structure information and diresidue frequencies, Bioinformatics, 21, 2336-2346 (2005)
[15] Ferre, F.; Clote, P., DiANNA: a web server for disulfide connectivity prediction, Nucleic Acids Res., 33, W230-W232 (2005)
[16] Fiser, A.; Simon, I., Predicting the oxidation state of cysteines by multiple sequence alignment, Bioinformatics, 16, 251-256 (2000)
[17] Fiser, A.; Cserzo, M.; Tudos, E.; Simon, I., Different sequence environments of cysteines and half cystines in proteins application to predict disulfide forming residues, FEBS Lett., 302, 117-120 (1992)
[18] Guo, Y. Z.; Yu, L. Z.; Wen, Z. N.; Li, M. L., Using support vector machine combined with auto covariance to predict proteinprotein interactions from protein sequences, Nucleic Acids Res., 36, 3025-3030 (2008)
[19] Guo, Y. Z.; Li, M. L.; Lu, M. C.; Wen, Z. N.; Huang, Z. T., Predicting G-protein coupled receptors-G-protein coupling specificity based on autocross-covariance transform, Proteins, 65, 55-60 (2006)
[20] Guo, Y. Z.; Li, M.; Lu, M.; Wen, Z.; Wang, K.; Li, G.; Wu, J., Classifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transform, Amino Acids, 30, 397-402 (2006)
[21] Hogg, P. J., Disulfide bonds as switches for protein function, Trends Biochem. Sci., 28, 210-214 (2003)
[22] Jones, D. T., Protein secondary structure prediction based on position-specific scoring matrices, J. Mol. Biol., 292, 195-202 (1999)
[23] Martelli, P. L.; Fariselli, P.; Malaguti, L.; Casadio, R., Prediction of the disulfide bonding state of cysteines in proteins with hidden neural networks, Protein Eng., 15, 951-953 (2002)
[24] Martelli, P. L.; Fariselli, P.; Malaguti, L.; Casadio, R., Prediction of the disuffide-bonding state of cysteines in proteins at 88
[25] Matthews, B. W., Comparison of predicted and observed secondary structure of T4 phage lysozyme, Biochim. Biophys. Acta., 405, 442-451 (1975)
[26] Mucchielli-Giorgi, M. H.M.; Hazout, S.; Tuffery, P., Predicting the disulfide bonding state of cysteines using protein descriptors, Proteins, 46, 243-249 (2002)
[27] Muskal, S. M.; Holbrook, S. R.; Kim, S. H., Prediction of the disulfide-bonding state of cysteine in proteins, Protein Eng., 3, 667-672 (1990)
[28] Radzicka, A.; Wolfenden, R., Comparing the polarities of the amino acids: side-chain distribution coefficients between the vapor phase, cyclohexane, 1-octanol, and neutral aqueous solution, Biochemistry, 27, 1664-1670 (1988)
[29] Schaffer, A. A.; Aravind, L.; Madden, T. L.; Shavirin, S.; Spouge, J. L.; Wolf, Y. I.; Koonin, E. V.; Altschul, S. F., Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements, Nucleic Acids Res., 29, 2994-3005 (2001)
[30] Shen, H. B.; Chou, K. C., Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM, Protein Eng. Des. Sel., 20, 561-567 (2007)
[31] Shen, H. B.; Yang, J.; Chou, K. C., Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction, Amino Acids, 33, 57-67 (2007)
[32] Song, J. N.; Li, W. J.; Xu, W. B., Cooperativity of the oxidization of cysteines in globular proteins, J. Theor. Biol., 231, 85-95 (2004) · Zbl 1447.92155
[33] Song, J. N.; Wang, M. L.; Li, W. J.; Xu, W. B., Prediction of the disulfide-bonding state of cysteines in proteins based on dipeptide composition, Biochem. Biophys. Res. Commun., 318, 142-147 (2004)
[34] Song, J. N.; Yuan, Z.; Tan, H.; Huber, T.; Burrage, K., Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure, Bioinformatics, 23, 3147-3154 (2007)
[35] Tessier, D.; Bardiaux, B.; Larre, C.; Popineau, Y., Data mining techniques to study the disulfide-bonding state in proteins: signal peptide is a strong descriptor, Bioinformatics, 20, 2509-2512 (2004)
[36] Vapnik, V., Statistical Learning Theory (1998), Wiley: Wiley New York · Zbl 0935.62007
[37] Vieille, C.; Zeikus, G. J., Hyperthermophilic enzymes: sources, uses, and molecular mechanisms for thermostability, Microbiol. Mol. Biol. Rev., 65, 1-43 (2001)
[38] Wimley, W. C.; White, S. H., Experimentally determined hydrophobicity scale for proteins at membrane interfaces, Nat. Struct. Biol., 3, 842-848 (1996)
[39] Wittrup, K. D., Disulfide bond formation and eukaryotic secretory productivity, Curr. Opin. Biotechnol., 6, 203-208 (1995)
[40] Xiao, R. Q.; Guo, Y. Z.; Zeng, Y. H.; Tan, H. F.; Pu, X. M.; Li, M. L., Using position specific scoring matrix and auto covariance to predict protein subnuclear localization, J. Biomed. Sci. Eng., 2, 51-56 (2009)
[41] Zeng, Y. H.; Guo, Y. Z.; Xiao, R. Q.; Yang, L.; Yu, L. Z.; Li, M. L., Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach, J. Theor. Biol., 259, 366-372 (2009) · Zbl 1402.92193
[42] Zhou, X. B.; Chen, C.; Li, Z. C.; Zou, X. Y., Improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine, Amino Acids, 35, 383-388 (2008)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.