zbMATH — the first resource for mathematics

Predicting protein fold pattern with functional domain and sequential evolution information. (English) Zbl 1400.92413
Summary: The fold pattern of a protein is one level deeper than its structural classification, and hence is more challenging and complicated for prediction. Many efforts have been made in this regard, but so far all the reported success rates are still under 70%, indicating that it is extremely difficult to enhance the success rate even by 1% or 2%. To address this problem, here a novel approach is proposed that is featured by combining the functional domain information and the sequential evolution information through a fusion ensemble classifier. The predictor thus developed is called PFP-FunDSeqE. Tests were performed for identifying proteins among their 27 fold patterns. Compared with the existing predictors tested by a same stringent benchmark dataset, the new predictor can, for the first time, achieve over 70% success rate. The PFP-FunDSeqE predictor is freely available to the public as a web server at http://www.csbio.sjtu.edu.cn/bioinf/PFP-FunDSeqE/.
Reviewer: Reviewer (Berlin)

92D20 Protein sequences, DNA sequences
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI
[1] Anfinsen, C.B., Principles that govern the folding of protein chains, Science, 181, 223-230, (1973)
[2] Anfinsen, C.B.; Scheraga, H.A., Experimental and theoretical aspects of protein folding, Adv. protein chem., 29, 205-300, (1975)
[3] Bastien, O., A simple derivation of the distribution of pairwise local protein sequence alignment scores, Evol. bioinformatics, 4, 41-45, (2008)
[4] Bastien, O.; Roy, S.; Marechal, E., Construction of non-symmetric substitution matrices derived from proteomes with biased amino acid distributions, C. R. biol., 328, 445-453, (2005)
[5] Bastien, O.; Ortet, P.; Roy, S.; Marechal, E., A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities, BMC bioinformatics, 6, 49, (2005)
[6] Bologna, G., Appel, R.D., 2002. A comparison study on protein fold recognition. In: Proceedings of the 9th International Conference on Neural Information Processing, vol. 5, pp. 2492-2496.
[7] Call, M.E.; Schnell, J.R.; Xu, C.; Lutz, R.A.; Chou, J.J.; Wucherpfennig, K.W., The structure of the zetazeta transmembrane dimer reveals features essential for its assembly with the T cell receptor, Cell, 127, 355-368, (2006)
[8] Carlacci, L.; Chou, K.C.; Maggiora, G.M., A heuristic approach to predicting the tertiary structure of bovine somatotropin, Biochemistry, 30, 4389-4398, (1991)
[9] Chen, K.; Kurgan, L., PFRES: protein fold classification by using evolutionary information and predicted secondary structure, Bioinformatics, 23, 2843-2850, (2007)
[10] Chou, K.C., Energy-optimized structure of antifreeze protein and its binding mechanism, J. mol. biol., 223, 509-517, (1992)
[11] Chou, K.C., Prediction of protein cellular attributes using pseudo amino acid composition, Proteins: struct. funct. genet., 43, 246-255, (2001), (Erratum: Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct. Funct. Genet. 44, 60)
[12] Chou, K.C., Review: structural bioinformatics and its impact to biomedical science, Curr. med. chem., 11, 2105-2134, (2004)
[13] Chou, K.C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[14] Chou, K.C.; Carlacci, L., Energetic approach to the folding of alpha/beta barrels, Proteins: struct. funct. genet., 9, 280-295, (1991)
[15] Chou, K.C.; Scheraga, H.A., Origin of the right-handed twist of beta-sheets of poly-\scl-valine chains, Proc. natl. acad. sci. USA, 79, 7047-7051, (1982)
[16] Chou, K.C.; Shen, H.B., Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers, J. proteome res., 5, 1888-1897, (2006)
[17] Chou, K.C.; Shen, H.B., Review: recent progresses in protein subcellular location prediction, Anal. biochem., 370, 1-16, (2007)
[18] Chou, K.C.; Zhang, C.T., Review: prediction of protein structural classes, Crit. rev. biochem. mol. biol., 30, 275-349, (1995)
[19] Chou, K.C.; Nemethy, G.; Scheraga, H.A., Energetic approach to packing of a-helices: 2. general treatment of nonequivalent and nonregular helices, J. am. chem. soc., 106, 3161-3170, (1984)
[20] Chou, K.C.; Maggiora, G.M.; Nemethy, G.; Scheraga, H.A., Energetics of the structure of the four-alpha-helix bundle in proteins, Proc. natl. acad. sci. USA, 85, 4295-4299, (1988)
[21] Chou, K.C.; Nemethy, G.; Pottle, M.; Scheraga, H.A., Energy of stabilization of the right-handed beta – alpha – beta crossover in proteins, J. mol. biol., 205, 241-249, (1989)
[22] Chou, K.C.; Nemethy, G.; Scheraga, H.A., Review: energetics of interactions of regular structural elements in proteins, Acc. chem. res., 23, 134-141, (1990)
[23] Chou, K.C.; Wei, D.Q.; Zhong, W.Z., Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS, Biochem. biophys. res. commun., 308, 148-151, (2003), (Erratum: Chou, K.C., Wei, D.Q., Zhong, W.Z., 2003. Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS. Biochem. Biophys. Res. Commun. 310, 675)
[24] Chung, I.F.; Huang, C.D., Recognition of structure classification of protein folding by NN and SVM hierarchical learning architecture, (), 1159-1167 · Zbl 1049.92502
[25] Denoeux, T., A k-nearest neighbor classification rule based on dempster – shafer theory, IEEE trans. syst. man cybern., 25, 804-813, (1995)
[26] Ding, C.H.; Dubchak, I., Multi-class protein fold recognition using support vector machines and neural networks, Bioinformatics, 17, 349-358, (2001)
[27] Ding, Y.S.; Zhang, T.L., Using Chou’s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier, Pattern recogn lett, 29, 1887-1892, (2008)
[28] Douglas, S.M.; Chou, J.J.; Shih, W.M., DNA-nanotube-induced alignment of membrane proteins for NMR structure determination, Proc. natl. acad. sci. USA, 104, 6644-6648, (2007)
[29] Du, Q.S.; Wang, S.; Wei, D.Q.; Sirois, S.; Chou, K.C., Molecular modelling and chemical modification for finding peptide inhibitor against SARS cov mpro, Anal. biochem., 337, 262-270, (2005)
[30] Du, Q.S.; Sun, H.; Chou, K.C., Inhibitor design for SARS coronavirus main protease based on “distorted key theory”, Med. chem., 3, 1-6, (2007)
[31] Dubchak, I.; Muchnik, I.; Mayor, C.; Dralyuk, I.; Kim, S.H., Recognition of a protein fold in the context of the structural classification of proteins (SCOP) classification, Proteins: struct. funct. genet., 35, 401-407, (1999)
[32] Finkelstein, A.V.; Ptitsyn, O.B., Why do globular proteins fit the limited set of folding patterns?, Prog. biophys. mol. biol., 50, 171-190, (1987)
[33] Finn, R.D.; Mistry, J.; Schuster-Bockler, B.; Griffiths-Jones, S.; Hollich, V.; Lassmann, T.; Moxon, S.; Marshall, M.; Khanna, A.; Durbin, R.; Eddy, S.R.; Sonnhammer, E.L.; Bateman, A., Pfam: clans, web tools and services, Nucleic acids res., 34, D247-D251, (2006)
[34] Gao, W.N.; Wei, D.Q.; Li, Y.; Gao, H.; Xu, W.R.; Li, A.X.; Chou, K.C., Agaritine and its derivatives are potential inhibitors against HIV proteases, Med. chem., 3, 221-226, (2007)
[35] Gonzalez-Diaz, H.; Vilar, S.; Santana, L.; Uriarte, E., Medicinal chemistry and bioinformatics—current trends in drugs discovery with networks topological indices, Curr. top. med. chem., 10, 1015-1029, (2007)
[36] Gonzalez-Díaz, H.; Gonzalez-Díaz, Y.; Santana, L.; Ubeira, F.M.; Uriarte, E., Proteomics, networks, and connectivity indices, Proteomics, 8, 750-778, (2008)
[37] Gribskov, M.; McLachlan, A.D.; Eisenberg, D., Profile analysis: detection of distantly related proteins, Proc. natl. acad. sci. USA, 84, 4355-4358, (1987)
[38] Holm, L.; Sander, C., Protein folds and families: sequence and structure alignments, Nucleic acids res., 27, 244-247, (1999)
[39] Jiang, X.; Wei, R.; Zhang, T.L.; Gu, Q., Using the concept of Chou’s pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy, Protein peptide lett., 15, 392-396, (2008)
[40] Letunic, I.; Copley, R.R.; Pils, B.; Pinkert, S.; Schultz, J.; Bork, P., SMART 5: domains in the context of genomes and networks, Nucleic acids res., 34, D257-D260, (2006)
[41] Levitt, M., Accurate modeling of protein conformation by automatic segment matching, J. mol. biol., 226, 507-533, (1992)
[42] Li, F.M.; Li, Q.Z., Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach, Protein peptide lett., 15, 612-616, (2008)
[43] Li, Y.; Wei, D.Q.; Gao, W.N.; Gao, H.; Liu, B.N.; Huang, C.J.; Xu, W.R.; Liu, D.K.; Chen, H.F.; Chou, K.C., Computational approach to drug design for oxazolidinones as antibacterial agents, Med. chem., 3, 576-582, (2007)
[44] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition, J. theor. biol., 252, 350-356, (2008) · Zbl 1398.92076
[45] Lin, H.; Ding, H.; Feng-Biao Guo, F.B.; Zhang, A.Y.; Huang, J., Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition, Protein peptide lett., 15, 739-744, (2008)
[46] Marchler-Bauer, A.; Anderson, J.B.; Derbyshire, M.K.; DeWeese-Scott, C.; Gonzales, N.R.; Gwadz, M.; Hao, L.; He, S.; Hurwitz, D.I.; Jackson, J.D.; Ke, Z.; Krylov, D.; Lanczycki, C.J.; Liebert, C.A.; Liu, C.; Lu, F.; Lu, S.; Marchler, G.H.; Mullokandov, M.; Song, J.S.; Thanki, N.; Yamashita, R.A.; Yin, J.J.; Zhang, D.; Bryant, S.H., CDD: a conserved domain database for interactive domain family analysis, Nucleic acids res., 35, D237-D240, (2007)
[47] Murzin, A.G.; Brenner, S.E.; Hubbard, T.; Chothia, C., SCOP: a structural classification of protein database for the investigation of sequence and structures, J. mol. biol., 247, 536-540, (1995)
[48] Nanni, L., A novel ensemble of classifiers for protein fold recognition, Neurocomputing, 69, 2434-2437, (2006)
[49] Nanni, L.; Lumini, A., Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization, Amino acids, 34, 653-660, (2008)
[50] Okun, O., 2004. Protein fold recognition with K-local hyperplane distance nearest neighbor algorithm. In: Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics, vol. 1, pp. 51-57.
[51] Oxenoid, K.; Chou, J.J., The structure of phospholamban pentamer reveals a channel-like architecture in membranes, Proc. natl. acad. sci. USA, 102, 10870-10875, (2005)
[52] Schaffer, A.A.; Aravind, L.; Madden, T.L.; Shavirin, S.; Spouge, J.L.; Wolf, Y.I.; Koonin, E.V.; Altschul, S.F., Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements, Nucleic acids res., 29, 2994-3005, (2001)
[53] Scheraga, H.A.; Khalili, M.; Liwo, A., Protein-folding dynamics: overview of molecular simulation techniques, Annu. rev. phys. chem., 58, 57-83, (2007)
[54] Schnell, J.R.; Chou, J.J., Structure and mechanism of the M2 proton channel of influenza A virus, Nature, 451, 591-595, (2008)
[55] Shen, H.B.; Chou, K.C., Ensemble classifier for protein fold pattern recognition, Bioinformatics, 22, 1717-1722, (2006)
[56] Shen, H.B.; Chou, K.C., Pseaac: a flexible web-server for generating various kinds of protein pseudo amino acid composition, Anal. biochem., 373, 386-388, (2008)
[57] Tatusov, R.L.; Fedorova, N.D.; Jackson, J.D.; Jacobs, A.R.; Kiryutin, B.; Koonin, E.V.; Krylov, D.M.; Mazumder, R.; Mekhedov, S.L.; Nikolskaya, A.N.; Rao, B.S.; Smirnov, S.; Sverdlov, A.V.; Vasudevan, S.; Wolf, Y.I.; Yin, J.J.; Natale, D.A., The COG database: an updated version includes eukaryotes, BMC bioinformatics, 4, 41, (2003)
[58] Wang, J.F.; Wei, D.Q.; Li, L.; Zheng, S.Y.; Li, Y.X.; Chou, K.C., 3D structure modeling of cytochrome P450 2C19 and its implication for personalized drug design, Biochem. biophys. res. commun., 355, 513-519, (2007), (Corrigendum: Wang, J.F., Wei, D.Q., Li, L., Zheng, S.Y., Li, Y.X., Chou, K.C., 2007. 3D structure modeling of cytochrome P450 2C19 and its implication for personalized drug design. Biochem. Biophys. Res. Commun. 357, 330)
[59] Wang, J.F.; Wei, D.Q.; Lin, Y.; Wang, Y.H.; Du, H.L.; Li, Y.X.; Chou, K.C., Insights from modeling the 3D structure of NAD(P)H-dependent \scd-xylose reductase of pichia stipitis and its binding interactions with NAD and NADP, Biochem. biophys. res. commun., 359, 323-329, (2007)
[60] Wang, S.Q.; Du, Q.S.; Chou, K.C., Study of drug resistance of chicken influenza A virus (H5N1) from homology-modeled 3D structures of neuraminidases, Biochem. biophys. res. commun., 354, 634-640, (2007)
[61] Wei, D.Q.; Sirois, S.; Du, Q.S.; Arias, H.R.; Chou, K.C., Theoretical studies of Alzheimer’s disease drug candidate [(2,4-dimethoxy) benzylidene]-anabaseine dihydrochloride (GTS-21) and its derivatives, Biochem. biophys. res. commun., 338, 1059-1064, (2005)
[62] Wei, D.Q.; Du, Q.S.; Sun, H.; Chou, K.C., Insights from modeling the 3D structure of H5N1 influenza virus neuraminidase and its binding interactions with ligands, Biochem. biophys. res. commun., 344, 1048-1055, (2006)
[63] Zhang, G.Y.; Fang, B.S., Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou’s amphiphilic pseudo amino acid composition, J. theor. biol., 253, 310-315, (2008)
[64] Zhang, H.; Wei, D.Q.; Zhang, R.; Wang, C.; Wei, H.; Chou, K.C., Screening for new agonists against Alzheimer’s disease, Med. chem., 3, 488-493, (2007)
[65] Zhang, R.; Wei, D.Q.; Du, Q.S.; Chou, K.C., Molecular modeling studies of peptide drug candidates against SARS, Med. chem., 2, 309-314, (2006)
[66] Zhou, X.B.; Chen, C.; Li, Z.C.; Zou, X.Y., Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. theor. biol., 248, 546-551, (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.