×

Using pseudo-amino acid composition and support vector machine to predict protein structural class. (English) Zbl 1447.92300

Summary: As a result of genome and other sequencing projects, the gap between the number of known protein sequences and the number of known protein structural classes is widening rapidly. In order to narrow this gap, it is vitally important to develop a computational prediction method for fast and accurately determining the protein structural class. In this paper, a novel predictor is developed for predicting protein structural class. It is featured by employing a support vector machine learning system and using a different pseudo-amino acid composition (PseAA), which was introduced to, to some extent, take into account the sequence-order effects to represent protein samples. As a demonstration, the jackknife cross-validation test was performed on a working dataset that contains 204 non-homologous proteins. The predicted results are very encouraging, indicating that the current predictor featured with the PseAA may play an important complementary role to the elegant covariant discriminant predictor and other existing algorithms.

MSC:

92D20 Protein sequences, DNA sequences
92-08 Computational methods for problems pertaining to biology

Software:

LogitBoost; LIBSVM
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bahar, I.; Atilgan, A. R.; Jernigan, R. L.; Erman, B., Understanding the recognition of protein structural classes by amino acid composition, Proteins: Struct. Funct. Genet., 29, 172-185 (1997)
[2] Baldi, P.; Brunak, S.; Chauvin, Y.; Andersen, C. A.F.; Nielsen, H., Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics, 16, 412-424 (2000)
[3] Brown, M. P.S.; Grundy, W. N.; Lin, D.; Cristianini, N.; Sugnet, C. W.; Furey, T. S.; Ares, M.; Haussler, D., Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc. Natl Acad. Sci. USA, 97, 262-267 (2000)
[4] Cai, Y. D.; Zhou, G. P., Prediction of protein structural classes by neural network, Biochimie, 82, 783-785 (2000)
[5] Cai, Y. D.; Liu, X. J.; Xu, X. B.; Zhou, G. P., Support vector machines for predicting protein structural class, BMC Bioinform., 2, 1-5 (2001)
[6] Cai, Y. D.; Liu, X. J.; Xu, X. B.; Chou, K. C., Prediction of protein structural classes by support vector machines, Comput. Chem., 26, 293-296 (2002)
[7] Cai, Y. D.; Zhou, G. P.; Chou, K. C., Support vector machines for predicting membrane protein types by using functional domain composition, Biophys. J., 84, 3257-3263 (2003)
[8] Cai, Y. D.; Feng, K. Y.; Lu, W. C.; Chou, K. C., Using LogitBoost classifier to predict protein structural classes, J. Theor. Biol., 238, 172-176 (2006) · Zbl 1445.92220
[9] Chang, C.C., Lin, C.J., 2001. LIBSVM: a library for support vector machine. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm
[10] Chou, K. C., A novel-approach to predicting protein structural classes in a (20-1)-D amino-acid-composition space, Proteins: Struct. Funct. Genet., 21, 319-344 (1995)
[11] Chou, K. C., Using pair-coupled amino acid composition to predict protein secondary structure content, J. Protein Chem., 18, 473-480 (1999)
[12] Chou, K. C., A key driving force in determination of protein structural classes, Biochem. Biophys. Res. Commun., 264, 216-224 (1999)
[13] Chou, K. C., Prediction of protein structural classes and subcellular locations, Curr Protein Peptide Sci, 1, 171-208 (2000)
[14] Chou, K. C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins: Struct. Funct. Genet., 43, 246-255 (2001)
[15] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19 (2005)
[16] Chou, K. C.; Cai, Y. D., Using functional domain composition and support vector machines for prediction of protein subcellular location, J. Biol. Chem., 277, 45765-45769 (2002)
[17] Chou, K. C.; Cai, Y. D., Predicting protein quaternary structure by pseudo amino acid composition, Proteins: Struct. Funct. Genet., 53, 282-289 (2003)
[18] Chou, K. C.; Cai, Y. D., Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition, J. Cell. Biochem., 91, 1197-1203 (2004)
[19] Chou, K. C.; Cai, Y. D., Predicting protein structural class by functional domain composition, Biochem. Biophys. Res. Commun., 321, 1007-1009 (2004), (Corrigendum: 2005. Biochem. Biophys. Res. Commun. 329, 1362)
[20] Chou, K. C.; Cai, Y. D., Prediction of membrane protein types by incorporating amphipathic effects, J. Chem. Inform. Model., 45, 407-413 (2005)
[21] Chou, K. C.; Zhang, C. T., A correlation-coefficient method to predicting protein-structural classes from amino-acid compositions, Eur. J. Biochem., 207, 429-433 (1992)
[22] Chou, K. C.; Zhang, C. T., Predicting protein-folding types by distance functions that make allowances for amino-acid interactions, J. Biol. Chem., 269, 22014-22020 (1994)
[23] Chou, K. C.; Zhang, C. T., Prediction of protein structural classes, Crit. Rev. Biochem. Mol. Biol., 30, 275-349 (1995)
[24] Cortes, C.; Vapnik, V., Support-vector networks, Mach. Learn., 20, 273-297 (1995) · Zbl 0831.68098
[25] Ding, C. H.Q.; Dubchak, I., Multi-class protein fold recognition using support vector machines and neural networks, Bioinformatics, 17, 349-358 (2001)
[26] Furey, T. S.; Cristianini, N.; Duffy, N.; Bednarski, D. W.; Schummer, M.; Haussler, D., Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, 16, 906-914 (2000)
[27] Garg, A.; Bhasin, M.; Raghava, G. P.S., Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search, J. Biol. Chem., 280, 14427-14432 (2005)
[28] Guo, J.; Chen, H.; Sun, Z. R.; Lin, Y. L., A novel method for protein secondary structure prediction using dual-layer SVM and profiles, Proteins: Struct. Funct. Bioinform., 54, 738-743 (2004)
[29] Isik, Z., Yanikoglu, B., Sezerman, U., 2004. Protein structural class determination using support vector machines. In: Computer and Information Sciences—Iscis, Proceedings, no. 3280, pp. 82-89.
[30] Kumar, M.; Bhasin, M.; Natt, N. K.; Raghava, G. P.S., BhairPred: prediction of beta-hairpins in a protein from multiple alignment information using ANN and SVM techniques, Nucl. Acids Res., 33, W154-W159 (2005)
[31] Kyte, J.; Doolittle, R. F., A simple method for displaying the hydropathic character of a protein, J. Mol. Biol., 157, 105-132 (1982)
[32] Levitt, M.; Chothia, C., Structural patterns in globular proteins, Nature, 261, 552-558 (1976)
[33] Liu, W. M.; Chou, K. C., Prediction of protein secondary structure content, Protein Eng., 12, 1041-1050 (1999)
[34] Luo, R. Y.; Feng, Z. P.; Liu, J. K., Prediction of protein structural class by amino acid and polypeptide composition, Eur. J. Biochem., 269, 4219-4225 (2002)
[35] Muskal, S. M.; Kim, S. H., Predicting protein secondary structure-content—a tandem neural network approach, J. Mol. Biol., 225, 713-727 (1992)
[36] Nakashima, H.; Nishikawa, K.; Ooi, T., The folding type of a protein is relevant to the amino-acid-composition, J. Biochem. (Tokyo)., 99, 153-162 (1986)
[37] Shen, H. B.; Chou, K. C., Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition, Biochem. Biophys. Res. Commun., 337, 752-756 (2005)
[38] Shen, H. B.; Chou, K. C., Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types, Biochem. Biophys. Res. Commun., 334, 288-292 (2005)
[39] Shen, H. B.; Yang, J.; Liu, X. J.; Chou, K. C., Using supervised fuzzy clustering to predict protein structural classes, Biochem. Biophys. Res. Commun., 334, 577-581 (2005)
[40] Ward, J. J.; McGuffin, L. J.; Buxton, B. F.; Jones, D. T., Secondary structure prediction with support vector machines, Bioinformatics, 19, 1650-1655 (2003)
[41] Xiao, X.; Shao, S. H.; Huang, Z. D.; Chou, K. C., Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor, J. Comput. Chem., 27, 478-482 (2006)
[42] Zhang, C. T.; Chou, K. C., An optimization approach to predicting protein structural class from amino-acid-composition, Protein Sci., 1, 401-408 (1992)
[43] Zhang, C. T.; Chou, K. C.; Maggiora, G. M., Predicting protein structural classes from amino-acid-composition—application of fuzzy clustering, Protein Eng., 8, 425-435 (1995)
[44] Zhou, G. F.; Xu, X. H.; Zhang, C. T., A weighting method for predicting protein structural class from amino-acid-composition, Eur. J. Biochem., 210, 747-749 (1992)
[45] Zhou, G. P., An intriguing controversy over protein structural class prediction, J. Protein Chem., 17, 729-738 (1998)
[46] Zhou, G. P.; Assa-Munt, N., Some insights into protein structural class prediction, Proteins: Struct. Funct. Genet., 44, 57-59 (2001)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.