zbMATH — the first resource for mathematics

Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation. (English) Zbl 1400.92417
Summary: In this paper, we intend to predict protein structural classes (\(\alpha\), \(\beta\), \(\alpha+\beta\), or \(\alpha/\beta\)) for low-homology data sets. Two data sets were used widely, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence homology being 40% and 25%, respectively. We propose to decompose the chaos game representation of proteins into two kinds of time series. Then, a novel and powerful nonlinear analysis technique, recurrence quantification analysis (RQA), is applied to analyze these time series. For a given protein sequence, a total of 16 characteristic parameters can be calculated with RQA, which are treated as feature representation of protein sequences. Based on such feature representation, the structural class for each protein is predicted with Fisher’s linear discriminant algorithm. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies with step-by-step procedure are 65.8% and 64.2% for 1189 and 25PDB data sets, respectively. With one-against-others procedure used widely, we compare our method with five other existing methods. Especially, the overall accuracies of our method are 6.3% and 4.1% higher for the two data sets, respectively. Furthermore, only 16 parameters are used in our method, which is less than that used by other methods. This suggests that the current method may play a complementary role to the existing methods and is promising to perform the prediction of protein structural classes.

92D20 Protein sequences, DNA sequences
91A80 Applications of game theory
68T05 Learning and adaptive systems in artificial intelligence
62F40 Bootstrap, jackknife and other resampling methods
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI
[1] Anand, A.; Pugalenthi, G.; Suganthan, P.N., Predicting protein structural class by SVM with class-wise optimized features and decision probabilities, J. theor. biol., 253, 375-380, (2008)
[2] Anfinsen, C., Principles that govern the folding of protein chains, Science, 181, 223-230, (1973)
[3] Bahar, I.; Atilgan, A.R.; Jernigan, R.L.; Erman, B., Understanding the recognition of protein structural classes by amino acid composition, Proteins, 29, 172-185, (1997)
[4] Basu, S.; Pan, A.; Dutta, C.; Das, J., Chaos game representation of proteins, J. mol. graphics, 15, 279-289, (1997)
[5] Brown, M.P.S.; Grundy, W.N.; Lin, D.; Cristianini, N.; Sugnet, C.; Ares, J.M.; Haussler, D., Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc. natl. acad. sci., 97, 262-267, (2000)
[6] Cai, Y.D., Is it a paradox or misinterpretation?, Proteins, 31, 97-130, (2001)
[7] Chen, C.; Tian, Y.X.; Zou, X.Y.; Cai, P.X.; Mo, J.Y., Using pseudo-amino acid composition and support vector machine to predict protein structural class, J. theor. biol., 243, 444-448, (2006)
[8] Chen, C.; Chen, L.X.; Zou, X.Y.; Cai, P.X., Predicting protein structural class based on multi-features fusion, J. theor. biol., 253, 388-392, (2008) · Zbl 1398.92196
[9] Chen, K.; Kurgan, L.A.; Ruan, J., Prediction of protein structural class using novel evolutionary collocation-based sequence representation, J. comput. chem., 29, 1596-1604, (2008)
[10] Chen, Y.L.; Li, Q.Z., Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition, J. theor. biol., 248, 377-381, (2007)
[11] Chen, Y.L.; Li, Q.Z., Prediction of the subcellular location of apoptosis proteins, J. theor. biol., 245, 775-783, (2007)
[12] Chou, K.C., A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space, Proteins, 21, 319-344, (1995)
[13] Chou, K.C., A key driving force in determination of protein structural classes, Biochem. biophys. res. commun., 264, 216-224, (1999)
[14] Chou, K.C., Review: prediction of protein structural classes and subcellular locations, Curr. protein peptide sci., 1, 171-208, (2000)
[15] Chou, K.C., Prediction of protein cellular attributes using pseudo amino acid composition, Proteins, 43, 246-255, (2001), (Erratum: Chou, K.C., (2001), Prediction of protein cellular attributes using pseudo amino acid composition. Proteins 44, 60)
[16] Chou, K.C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[17] Chou, K.C., Review: progress in protein structural class prediction and its impact to bioinformatics and proteomics, Curr. protein peptide sci., 6, 423-436, (2005)
[18] Chou, K.C.; Cai, Y.D., Predicting protein structural class by functional domain composition, Biochem. biophys. res. commun., 321, 1007-1009, (2004), (Corrigendum: Chou, K.C., Cai, Y.D., 2005. Predicting protein structural class by functional domain composition. Biochem. Biophys. Res. Commun. 329, 1362)
[19] Chou, K.C.; Liu, W.M.; Maggiora, G.M.; Zhang, C.T., Prediction and classification of domain structural classes, Proteins, 31, 97-130, (1998)
[20] Chou, K.C.; Maggiora, G.M., Domain structural class prediction, Protein eng., 11, 523-538, (1998)
[21] Chou, K.C.; Shen, H.B., Hum-ploc: a novel ensemble classifier for predicting human protein subcellular localization, Biochem. biophys. res. commun., 347, 150-157, (2006)
[22] Chou, K.C.; Shen, H.B., Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers, J. proteome res., 5, 1888-1897, (2006)
[23] Chou, K.C.; Shen, H.B., Euk-mploc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites, J. proteome res., 6, 1728-1734, (2007)
[24] Chou, K.C.; Shen, H.B., Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides, Biochem. biophys. res. commun., 357, 633-640, (2007)
[25] Chou, K.C.; Shen, H.B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochem. biophys. res. commun., 360, 339-345, (2007)
[26] Chou, K.C.; Shen, H.B., Review: recent progress in protein subcellular location prediction, Anal. biochem., 370, 1-16, (2007)
[27] Chou, K.C.; Shen, H.B., Cell-ploc: a package of web-servers for predicting subcellular localization of proteins in various organisms, Nat. protocols, 3, 153-162, (2008)
[28] Chou, K.C., Shen, H.B., 2008b. ProtIdent: a web server for identifying proteases and their types by fusing functional domain and sequential evolution information. Biochem. Biophys. Res. Commun. 376, 324-325.
[29] Chou, K.C.; Zhang, C.T., Predicting protein folding types by distance functions that make allowances for amino acid interactions, J. biol. chem., 269, 22014-22020, (1994)
[30] Chou, K.C.; Zhang, C.T., Predicting of protein structural class, Crit. rev. biochem. mol. biol., 30, 275-349, (1995)
[31] Deschavanne, P.; Tufféry, P., Exploring an alignment free approach for protein classification and structural class prediction, Biochimie, 90, 615-625, (2008)
[32] Ding, C.H.; Dubchak, I., Multi-class protein fold recognition using support vector machines and neural networks, Bioinformatics, 17, 349-358, (2001)
[33] Du, P.; Li, Y., Prediction of C-to-U RNA editing sites in plant mitochondria using both biochemical and evolutionary information, J. theor. biol., 253, 579-589, (2008)
[34] Dubchak, I.; Muchnik, I.; Holbrook, S.R.; Kim, S.H., Prediction of protein-folding class using global description of amino-acid sequence, Proc. natl. acad. sci., 92, 8700-8704, (1995)
[35] Duda, R.O.; Hart, P.E.; Stork, D.G., Pattern classification, (2001), Wiley New York · Zbl 0968.68140
[36] Eckmann, J.P.; Kamphorst, S.O.; Ruelle, D., Recurrence plots of dynamical systems, Europhys. lett., 4, 973-977, (1987)
[37] Eisenhaber, F.; Frömmel, C.; Argos, P., Prediction of secondary structural content of proteins from their amino acid composition alone. II. the paradox with secondary structural class, Proteins, 25, 169-179, (1998)
[38] Feng, K.Y.; Cai, Y.D.; Chou, K.C., Boosting classifier for predicting protein domain structural class, Biochem. biophys. res. commun., 334, 213-217, (2005)
[39] Fiser, A.; Tusnády, G.E.; Simon, I., Chaos game representation of protein structures, J. mol. graphics, 12, 302-304, (1994)
[40] Giuliani, A.; Benigni, R.; Zbilut, J.P.; Webber, C.L.; Sirabella, P.; Colosimo, A., Nonlinear signal analysis methods in the elucidation of protein sequence – structure relationships, Chem. rev., 102, 1471-1491, (2002)
[41] Giuliani, A.; Sirabella, P.; Benigni, R.; Colosimo, A., Mapping protein sequence spaces by recurrence: a case study on chimeric structures, Protein eng., 13, 671-678, (2000)
[42] Giuliani, A.; Tomasi, M., Recurrence quantification analysis reveals interaction partners in paramyxoviridae envelope glycoproteins, Proteins, 46, 171-176, (2002)
[43] Jeffrey, H.J., Chaos game representation of gene structure, Nucleic acids res., 18, 2163-2170, (1990)
[44] Jiang, X.; Wei, R.; Zhang, T.L.; Gu, Q., Using the concept of Chou’s pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy, Protein pept. lett., 15, 392-396, (2008)
[45] Jin, Y.; Niu, B.; Feng, K.Y.; Lu, W.C.; Cai, Y.D.; Li, G.Z., Predicting subcellular localization with adaboost learner, Protein pept. lett., 15, 286-289, (2008)
[46] Kedarisetti, K.D.; Kurgan, L.A.; Dick, S., Classifier ensembles for protein structural class prediction with varying homology, Biochem. biophys. res. commun., 348, 981-988, (2006)
[47] Kurgan, L.A.; Homaeian, L., Prediction of structural classes for protein sequences and domains—impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy, Pattern recognition, 39, 2323-2343, (2006) · Zbl 1103.68767
[48] Levitt, M.; Chothia, C., Structural patterns in globular proteins, Nature, 261, 552-558, (1976)
[49] Li, F.M.; Li, Q.Z., Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach, Protein pept. lett., 15, 612-616, (2008)
[50] Li, H.; Helling, R.; Tang, C.; Wingreen, N.S., Emergence of preferred structures in a simple model of protein folding, Science, 273, 666-669, (1996)
[51] Li, H.; Tang, C.; Wingreen, N.S., Are protein folds atypical?, Proc. natl. acad. sci., 95, 4987-4990, (1998)
[52] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition, J. theor. biol., 252, 350-356, (2008) · Zbl 1398.92076
[53] Lin, H.; Ding, H.; Guo, F.B.; Zhang, A.Y.; Huang, J., Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition, Protein pept. lett., 15, 739-744, (2008)
[54] Manetti, C.; Ceruso, M.A.; Giuliani, A.; Webber, C.L.; Zbilut, J.P., Recurrence quantification analysis as a tool for the characterization of molecular dynamics simulations, Phys. rev. E, 59, 992-998, (1999)
[55] Marwan, N.; Romano, M.C.; Thiel, M.; Kurths, J., Recurrence plots for the analysis of complex systems, Phys. rep., 438, 237-329, (2007)
[56] Munteanu, C.B.; Gonzalez-Diaz, H.; Magalhaes, A.L., Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices, J. theor. biol., 254, 476-482, (2008) · Zbl 1400.92405
[57] Niu, B.; Jin, Y.H.; Feng, K.Y.; Liu, L.; Lu, W.C.; Cai, Y.D.; Li, G.Z., Predicting membrane protein types with bagging learner, Protein pept. lett., 15, 590-594, (2008)
[58] Nishkawa, K.; Ooi, T., Correlation of the amino acid composition of a protein to its structural and biological characters, J. biochem., 91, 1821-1824, (1982)
[59] Riley, M.A., Van Orden, G.C., 2005. Tutorials in contemporary nonlinear methods for the behavioral sciences. Retrieved March 1, 2005, from \(\langle\)http://www.nsf.gov/sbe/bcs/pac/nmbs/nmbs.jsp⟩.
[60] Shen, H.B.; Chou, K.C., Signal-3L: a 3-layer approach for predicting signal peptide, Biochem. biophys. res. commun., 363, 297-303, (2007)
[61] Shen, H.B.; Chou, K.C., Ezypred: a top-down approach for predicting enzyme functional classes and subclasses, Biochem. biophys. res. commun., 364, 53-59, (2007)
[62] Shen, H.B.; Yang, J.; Liu, X.J.; Chou, K.C., Using supervised fuzzy clustering to predict protein structural classes, Biochem. biophys. res. commun., 334, 577-581, (2005)
[63] Shi, M.G.; Huang, D.S.; Li, X.L., A protein interaction network analysis for yeast integral membrane protein, Protein pept. lett., 15, 692-699, (2008)
[64] Wang, B.; Yu, Z.G., One way to characterize the compact structures of lattice protein model, J. chem. phys., 112, 6084-6088, (2000)
[65] Wang, Z.X.; Yuan, Z., How good is the prediction of protein structural class by the component-coupled method?, Proteins, 38, 165-175, (2000)
[66] Webber, C.L.; Giuliani, A.; Zbilut, J.P.; Colosimo, A., Elucidating protein secondary structures using alpha-carbon recurrence quantifications, Proteins, 3, 292-303, (2001)
[67] Webber, C.L.; Zbilut, J.P., Dynamical assessment of physiological systems and states using recurrence plot strategies, J. appl. physiol., 76, 965-973, (1994)
[68] Wu, G.; Yan, S., Prediction of mutations in H3N2 hemagglutinins of influenza a virus from north America based on different datasets, Protein pept. lett., 15, 144-152, (2008)
[69] Yang, J.Y.; Yu, Z.G.; Anh, V., Correlations between designability and various structural characteristics of protein lattice models, J. chem. phys., 126, 195101, (2007)
[70] Yang, J.Y., Yu, Z.G., Anh, V., 2007b. Clustering structures of large proteins using multifractal analyses based on a 6-letter model and hydrophobicity scale of amino acids. Chaos Solitons Fractals, in press, doi:10.1016/j.chaos.2007.08.014.
[71] Yang, J.Y.; Yu, Z.G.; Anh, V., Protein structure classification based on chaos game representation and multifractal analysis, (), 665-669
[72] Yu, Z.G.; Anh, V.; Lau, K.S., Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses, J. theor. biol., 226, 341-348, (2004)
[73] Yu, Z.G.; Anh, V.; Lau, K.S.; Zhou, L.Q., Clustering of protein structures using hydrophobic free energy and solvent accessibility of proteins, Phys. rev. E, 73, 031920, (2006)
[74] Zaldívar, J.M.; Strozzi, F.; Dueri, S.; Marinov, D.; Zbilut, J.P., Characterization of regime shifts in environmental time series with recurrence quantification analysis, Ecol. modelling, 210, 58-70, (2008)
[75] Zbilut, J.P.; Mitchell, J.C.; Giuliani, A.; Colosimo, A.; Marwan, N.; Webber, C.L., Singular hydrophobicity patterns and net charge: a mesoscopic principle for protein aggregation/folding, Physica A, 343, 348-358, (2004)
[76] Zbilut, J.P.; Webber, C.L., Embeddings and delays as derived from quantification of recurrence plots, Phys. lett. A, 171, 199-203, (1992)
[77] Zhang, G.Y.; Fang, B.S., Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou’s amphiphilic pseudo amino acid composition, J. theor. biol., 253, 310-315, (2008)
[78] Zhang, T.L.; Ding, Y.S.; Chou, K.C., Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern, J. theor. biol., 250, 186-193, (2008) · Zbl 1397.92551
[79] Zhou, G.P., An intriguing controversy over protein structural class prediction, J. protein chem., 17, 729-738, (1998)
[80] Zhou, G.P.; Assa-Munt, N., Some insights into protein structural class prediction, Proteins, 44, 57-59, (2001)
[81] Zhou, X.B.; Chen, C.; Li, Z.C.; Zou, X.Y., Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. theor. biol., 248, 546-551, (2007)
[82] Zhou, Y.; Yu, Z.G.; Anh, V., Cluster protein structures using recurrence quantification analysis on coordinates of alpha-carbon atoms of proteins, Phys. lett. A, 368, 314-319, (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.