×

zbMATH — the first resource for mathematics

Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern. (English) Zbl 1397.92551
Summary: Compared with the conventional amino acid (AA) composition, the pseudo-amino acid (PseAA) composition as originally introduced for protein subcellular location prediction can incorporate much more information of a protein sequence, so as to remarkably enhance the power of using a discrete model to predict various attributes of a protein. In this study, based on the concept of PseAA composition, the approximate entropy and hydrophobicity pattern of a protein sequence are used to characterize the PseAA components. Also, the immune genetic algorithm (IGA) is applied to search the optimal weight factors in generating the PseAA composition. Thus, for a given protein sequence sample, a 27-D (dimensional) PseAA composition is generated as its descriptor. The fuzzy K nearest neighbors (FKNN) classifier is adopted as the prediction engine. The results thus obtained in predicting protein structural classification are quite encouraging, indicating that the current approach may also be used to improve the prediction quality of other protein attributes, or at least can play a complimentary role to the existing methods in the relevant areas. Our algorithm is written in Matlab that is available by contacting the corresponding author.

MSC:
92D20 Protein sequences, DNA sequences
68T10 Pattern recognition, speech recognition
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Argos, P.; Rao, J.K.; Hargrave, P.A., Structural prediction of membrane-bound proteins, Eur. J. biochem., 128, 565-575, (1982)
[2] Cao, Y.; Liu, S.; Zhang, L.; Qin, J.; Wang, J.; Tang, K., Prediction of protein structural class with rough sets, BMC bioinform., 7, 20-25, (2006)
[3] Carlacci, L.; Chou, K.C.; Maggiora, G.M., A heuristic approach to predicting the tertiary structure of bovine somatotropin, Biochemistry, 30, 4389-4398, (1991)
[4] Chandonia, J.M.; Karplus, M., Neural networks for secondary structure and structural class prediction, Protein sci., 4, 275-285, (1995)
[5] Chen, Y.L.; Li, Q.Z., Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition, J. theor. biol., 248, 377-381, (2007)
[6] Chen, Y.L.; Li, Q.Z., Prediction of the subcellular location of apoptosis proteins, J. theor. biol., 245, 775-783, (2007)
[7] Chen, C.; Zhou, X.; Tian, Y.; Zou, X.; Cai, P., Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network, Anal. biochem., 357, 116-121, (2006)
[8] Chen, C.; Tian, Y.X.; Zou, X.Y.; Cai, P.X.; Mo, J.Y., Using pseudo-amino acid composition and support vector machine to predict protein structural class, J. theor. biol., 243, 444-448, (2006)
[9] Chou, K.C., Energy-optimized structure of antifreeze protein and its binding mechanism, J. mol. biol., 223, 509-517, (1992)
[10] Chou, K.C., A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space, Proteins: structure, function and genetics, 21, 319-344, (1995)
[11] Chou, K.C., A key driving force in determination of protein structural classes, Biochem. biophys. res. commun., 264, 216-224, (1999)
[12] Chou, K.C., Prediction of protein cellular attributes using pseudo amino acid composition, Proteins: structure, function, and genetics, 43, 246-255, (2001), (Erratum: Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Structure, Function, and Genetics 44, 60)
[13] Chou, K.C., A new branch of proteomics: prediction of protein cellular attributes, (), 57-70, (Chapter 4)
[14] Chou, K.C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[15] Chou, K.C., Review: progress in protein structural class prediction and its impact to bioinformatics and proteomics, Curr. protein pept. sci., 6, 423-436, (2005)
[16] Chou, P.Y., Prediction of protein structural classes from amino acid composition, (), 549-586
[17] Chou, K.C.; Cai, Y.D., Using functional domain composition and support vector machines for prediction of protein subcellular location, J. biol. chem., 277, 45765-45769, (2002)
[18] Chou, K.C.; Cai, Y.D., Predicting protein structural class by functional domain composition, Biochem. biophys. res. commun., 321, 1007-1009, (2004), (Corrigendum: Chou, K.C., Cai, Y.D., 2005. Biochem. Biophys. Res. Commun. 329, 1362)
[19] Chou, K.C.; Elrod, D.W., Protein subcellular location prediction, Protein eng., 12, 107-118, (1999)
[20] Chou, K.C.; Shen, H.B., Large-scale plant protein subcellular location prediction, J. cell. biochem., 100, 665-678, (2007)
[21] Chou, K.C.; Shen, H.B., Euk-mploc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites, J. proteome res., 6, 1728-1734, (2007)
[22] Chou, K.C.; Shen, H.B., Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides, Biochem. biophys. res. commun., 357, 633-640, (2007)
[23] Chou, K.C.; Shen, H.B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochem. biophys. res. commun., 360, 339-345, (2007)
[24] Chou, K.C.; Zhang, C.T., Predicting protein folding types by distance functions that make allowances for amino acid interactions, J. biol. chem., 269, 22014-22020, (1994)
[25] Chou, K.C.; Zhang, C.T., Review: prediction of protein structural classes, Crit. rev. biochem. mol. biol., 30, 275-349, (1995)
[26] Cosic, I., Macromolecular bioactivity: is it resonant interaction between macromolecules?—theory and applications, IEEE trans. biomed. eng., 41, 1101-1114, (1994)
[27] Deleage, G.; Roux, B., An algorithm for protein secondary structure prediction based on class prediction, Protein eng., 1, 289-294, (1987)
[28] Du, P.; Li, Y., Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence, BMC bioinform., 7, 518-525, (2006)
[29] Du, Q.S.; Wei, D.Q.; Chou, K.C., Correlation of amino acids in proteins, Peptides, 24, 1863-1869, (2003)
[30] Du, Q.S.; Jiang, Z.Q.; He, W.Z.; Li, D.P.; Chou, K.C., Amino acid principal component analysis (AAPCA) and its applications in protein structural class prediction, J. biomol. struct. dyn., 23, 635-640, (2006)
[31] Fasman, G.D., Proteins, vol. 1, (1976), CRC Press Cleveland
[32] Fauchere, J.L.; Charton, M.; Kier, L.B.; Verloop, A.; Pliska, V., Amino acid side chain parameters for correlation studies in biology and pharmacology, Int. J. pept. protein res., 32, 269-278, (1988)
[33] Finkelstein, A.V.; Ptitsyn, O.B., Why do globular proteins fit the limited set of folding patterns?, Prog. biophys. mol. biol., 50, 171-190, (1987)
[34] Hong, B.; Tang, Q.Y.; Yang, F.S., Apen and cross-apen: property, fast algorithm and preliminary application to the study of EEG and cognition, Signal process., 15, 100-108, (1999)
[35] Hopp, T.P.; Woods, K.R., Prediction of protein antigenic determinants from amino acid sequences, Proc. natl acad. sci. USA, 78, 3824-3828, (1981)
[36] Huang, Y.; Li, Y., Prediction of protein subcellular locations using fuzzy k-NN method, Bioinformatics, 20, 21-28, (2004)
[37] Huang, W.L.; Chen, H.M.; Hwang, S.F.; Ho, S.Y., Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method, Biosystems, 90, 405-413, (2006)
[38] Janin, J., Surface and inside volumes in globular proteins, Nature, 277, 491-492, (1979)
[39] Janin, J.; Wodak, S., Conformation of amino acid side-chains in proteins, J. mol. biol., 125, 357-386, (1978)
[40] Kawashima, S.; Ogata, H.; Kanehisa, M., Aaindex: amino acid index database, Nucleic acids res., 27, 368-369, (1999)
[41] Kedarisetti, K.D.; Kurgan, L.A.; Dick, S., Classifier ensembles for protein structural class prediction with varying homology, Biochem. biophys. res. commun., 348, 981-988, (2006)
[42] Keller, J.M.; Gray, M.R.; Givens, J.A., A fuzzy k-nearest neighbours algorithm, IEEE trans. syst. man cybern., 15, 580-585, (1985)
[43] Klein, P., Prediction of protein structural class by discriminant analysis, Biochim. biophys. acta, 874, 205-215, (1986)
[44] Klein, P.; Delisi, C., Prediction of protein structural class from amino acid sequence, Biopolymers, 25, 1659-1672, (1986)
[45] Kneller, D.G.; Cohen, F.E.; Langridge, R., Improvements in protein secondary structure prediction by an enhanced neural network, J. mol. biol., 214, 171-182, (1990)
[46] Kurgan, L.; Homaeian, L., Prediction of structural classes for protein sequences and domains—impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy, Pattern recognition lett., 39, 2323-2343, (2006) · Zbl 1103.68767
[47] Kurgan, L.A.; Stach, W.; Ruan, J., Novel scales based on hydrophobicity indices for secondary protein structure, J. theor. biol., 248, 354-366, (2007)
[48] Levitt, M.; Chothia, C., Structural patterns in globular proteins, Nature, 261, 552-557, (1976)
[49] Lim, V.I., Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins, J. mol. biol., 88, 873-894, (1974)
[50] Lin, H.; Li, Q.Z., Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant, Biochem. biophys. res. commun., 354, 548-551, (2007)
[51] Lin, H.; Li, Q.Z., Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components, J. comput. chem., 28, 1463-1466, (2007)
[52] Liu, H.; Wang, M.; Chou, K.C., Low-frequency Fourier spectrum for predicting membrane protein types, Biochem. biophys. res. commun., 336, 737-739, (2005)
[53] Liu, H.; Yang, J.; Wang, M.; Xue, L.; Chou, K.C., Using Fourier spectrum analysis and pseudo amino acid composition for prediction of membrane protein types, Protein J., 24, 385-389, (2005)
[54] Luo, R.Y.; Feng, Z.P.; Liu, J.K., Prediction of protein strctural class by amino acid and polypeptide composition, Eur. J. biochem., 269, 4219-4225, (2002)
[55] Metfessel, B.A.; Saurugger, P.N.; Connelly, D.P.; Rich, S.T., Cross-validation of protein structural class prediction using statistical clustering and neural networks, Protein sci., 2, 1171-1182, (1993)
[56] Mondal, S.; Bhavna, R.; Mohan Babu, R.; Ramakumar, S., Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification, J. theor. biol., 243, 252-260, (2006)
[57] Nakashima, H.; Nishikawa, K.; Ooi, T., The folding type of a protein is relevant to the amino acid composition, J. biochem., 99, 152-162, (1986)
[58] Pincus, S.M., Approximate entropy as a measure of system complexity, Proc. natl acad. sci. USA, 88, 2297-2301, (1991) · Zbl 0756.60103
[59] Pu, X.; Guo, J.; Leung, H.; Lin, Y., Prediction of membrane protein types from sequences and position-specific scoring matrices, J. theor. biol., 247, 259-265, (2007)
[60] Richman, J.S.; Moorman, J.R., Physiological time-series analysis using approximate entropy and sample entropy, Am. J. physiol. heart circ. physiol., 278, H2039-H2049, (2000)
[61] Rose, G.D.; Geselowitz, A.R.; Lesser, G.J.; Lee, R.H.; Zehfus, M.H., Hydrophobicity of amino acid residues in globular proteins, Science, 229, 834-838, (1985)
[62] Sadovsky, M.G., The method to compare nucleotide sequences based on the minimum entropy principle, Bull. math. biol., 65, 309-322, (2003) · Zbl 1334.92149
[63] Shen, H.B.; Chou, K.C., Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo amino acid composition to predict membrane protein types, Biochem. biophys. res. commun., 334, 288-292, (2005)
[64] Shen, H.B.; Chou, K.C., Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition, Biochem. biophys. res. commun., 337, 752-756, (2005)
[65] Shen, H.B.; Chou, K.C., Ensemble classifier for protein fold pattern recognition, Bioinformatics, 22, 1717-1722, (2006)
[66] Shen, H.B.; Chou, K.C., Gpos-ploc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins, Protein eng. des. sel., 20, 39-46, (2007)
[67] Shen, H.B.; Chou, K.C., Virus-ploc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells, Biopolymers, 85, 233-240, (2007)
[68] Shen, H.B.; Chou, K.C., Hum-mploc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites, Biochem. biophys. res. commun., 355, 1006-1011, (2007)
[69] Shen, H.B.; Yang, J.; Liu, X.J.; Chou, K.C., Using supervised fuzzy clustering to predict protein structural classes, Biochem. biophys. res. commun., 334, 577-581, (2005)
[70] Shen, H.B.; Yang, J.; Chou, K.C., Fuzzy KNN for predicting membrane protein types from pseudo amino acid composition, J. theor. biol., 240, 9-13, (2006)
[71] Shen, H.B.; Yang, J.; Chou, K.C., Euk-ploc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction, Amino acids, 33, 57-67, (2007)
[72] Shi, J.Y.; Zhang, S.W.; Pan, Q.; Cheng, Y.-M.; Xie, J., Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition, Amino acids, 33, 69-74, (2007)
[73] Tanford, C., Contribution of hydrophobic interactions to the stability of the globular conformation of proteins, J. am. chem. soc., 84, 4240-4274, (1962)
[74] Wang, Z.X.; Yuan, Z., How good is the prediction of protein structural class by the component-coupled method?, Proteins, 38, 165-175, (2000)
[75] Wang, M.; Yang, J.; Liu, G.P.; Xu, Z.J.; Chou, K.C., Weighted-support vector machines for predicting membrane protein types based on pseudo amino acid composition, Protein eng. des. sel., 17, 509-516, (2004)
[76] Wang, S.Q.; Yang, J.; Chou, K.C., Using stacked generalization to predict membrane protein types based on pseudo amino acid composition, J. theor. biol., 242, 941-946, (2006)
[77] Xiao, X.; Shao, S.H.; Huang, Z.D.; Chou, K.C., Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor, J. comput. chem., 27, 478-482, (2006)
[78] Xiao, X.; Shao, S.H.; Ding, Y.S.; Huang, Z.D.; Chou, K.C., Using cellular automata images and pseudo amino acid composition to predict protein subcellular location, Amino acids, 30, 49-54, (2006)
[79] Zhang, T.L., Ding, Y.S., 2007. Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids, doi:10.1007/s00726-007-0496-1.
[80] Zhang, C.T.; Chou, K.C.; Maggiora, G.M., Predicting protein structural classes from amino acid composition: application of fuzzy clustering, Protein eng., 8, 425-435, (1995)
[81] Zhang, S.W.; Pan, Q.; Zhang, H.C.; Shao, Z.C.; Shi, J.Y., Prediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusion, Amino acids, 30, 461-468, (2006)
[82] Zhang, T.L.; Ding, Y.S.; Chou, K.C., Prediction of protein subcellular location using hydrophobic patterns of amino acid sequence, Comput. biol. chem., 30, 367-371, (2006) · Zbl 1119.92033
[83] Zhou, G.P., An intriguing controversy over protein structural class prediction, J. protein chem., 17, 729-738, (1998)
[84] Zhou, G.P.; Assa-Munt, N., Some insights into protein structural class prediction, Proteins: structure, function, and genetics, 44, 57-59, (2001)
[85] Zhou, G.P.; Doctor, K., Subcellular location prediction of apoptosis proteins, Proteins: structure, function, and genetics, 50, 44-48, (2003)
[86] Zhou, X.B.; Chen, C.; Li, Z.C.; Zou, X.Y., Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. theor. biol., 248, 546-551, (2007)
[87] Zimmerman, J.M.; Eliezer, N.; Simha, R., The characterization of amino acid sequences in proteins by statistical methods, J. theor. biol., 21, 170-201, (1968)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.