×

Discriminate protein decoys from native by using a scoring function based on ubiquitous phi and psi angles computed for all atom. (English) Zbl 1343.92385

Summary: The success of solving the protein folding and structure prediction problems in molecular and structural biology relies on an accurate energy function. With the rapid advancement in the computational biology and bioinformatics fields, there is a growing need of solving unknown fold and structure faster and thus an accurate energy function is indispensable. To address this need, we develop a new potential function, namely 3DIGARS3.0, which is a linearly weighted combination of 3DIGARS, mined accessible surface area (ASA) and ubiquitously computed Phi (uPhi) and Psi (uPsi) energies – optimized by a genetic algorithm (GA). We use a dataset of 4332 protein-structures to generate uPhi and uPsi based score libraries to be used within the core 3DIGARS method. The optimized weight of each component is obtained by applying Genetic Algorithm based optimization on three challenging decoy sets. The improved 3DIGARS3.0 outperformed state-of-the-art methods significantly based on a set of independent test datasets.

MSC:

92D20 Protein sequences, DNA sequences
92E10 Molecular structure (graph-theoretic methods, methods of differential topology, etc.)
92-04 Software, source code, etc. for problems pertaining to biology
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Berardi, M. J., Mitochondrial uncoupling protein 2 structure determined by NMR molecular fragment searching, Nature, 476, 109-113 (2011)
[2] Berman, H., The protein data bank, Nucl. Acids Res., 28, 235-242 (2000)
[3] Bhandari, D.; Murthy, C. A.; Pal, S. K., Genetic algorithm with elitist model and its convergence, Int. J. Pattern Recognit. Artif. Intell., 10, 06, 731-747 (1996)
[4] Borguesana, B., APL: Anangleprobabilitylisttoimproveknowledge-based metaheuristics forthethree-dimensionalproteinstructureprediction, Comput. Biol. Chem., 59, 142-157 (2015)
[5] Brüschweiler, S., Substrate-modulated ADP/ATP-transporter dynamics revealed by NMR relaxation dispersion, Nat. Struct. Mol. Biol., 22, 636-641 (2015)
[6] Brooks, B. R., CHARMM: A program for macromolecular energy, minimization, and dynamics calculations, J. Comput. Chem., 4, 187-217 (1983)
[7] Cai, Y.-D., Predicting protein quaternary structure by pseudo amino acid composition, Protein: Struct. Func. Genet., 53, 282-289 (2003)
[8] Carlacci, L.; Chou, K.-C.; Maggiora, G. M., A heuristic approach to predicting the tertiary structure of bovine somatotropin, Biochemistry, 30, 4389-4398 (1991)
[9] Carter, D. B.; Chou, K.-C., A model for structure-dependent binding of Congo red to Alzheimer β-amyloid fibrils, Neurobiol. Aging, 19, 37-40 (1998)
[10] Chen, N.-Y., The biological functions of low-frequency phonons, Sci. Sin., 20, 447-457 (1977)
[11] Chen, W., iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition, Nucl. Acids Res., 41 (2013), (p. gks1450)
[12] Chou, K. C., An energy-based approach to packing the 7-helix bundle of bacteriorhodopsin, Protein Sci., 1, 810-827 (1992)
[13] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 1, 236-247 (2011) · Zbl 1405.92212
[14] Chou, K.-C., Interactions between an alpha-helix and a beta-sheet. Energetics of alpha/beta packing in proteins, J. Mol. Biol., 186, 591-609 (1985)
[15] Chou, K.-C., Low-frequency collective motion in biomacromolecules and its biological functions, Biophys. Chem., 30, 3-48 (1988)
[16] Chou, K.-C., Structural bioinformatics and its impact to biomedical science, Curr. Med. Chem., 11, 2105-2134 (2004)
[17] Chou, K.-C.; Scheraga, H. A., Origin of the right-handed twist of beta-sheets of poly (LVal) chains, Proc. Nat. Acad. Sci., 79, 7047-7051 (1982)
[18] Chou, K.-C.; Carlacci, L., Energetic approach to the folding of α/β barrels, Protein: Struct. Funct. Bioinf., 9, 280-295 (1991)
[19] Chou, K.-C.; Zhang, C.-T., Prediction of protein structural classes, Crit. Rev. Biochem. Mol. Biol., 30, 275-349 (1995)
[20] Chou, K.-C.; Cai, Y.-D., Prediction of membrane protein types by incorporating amphipathic effects, J. Chem. Inf. Model., 45, 407-413 (2005)
[21] Chou, K.-C.; Maggiora, G. M.; Scheraga, H. A., Role of loop-helix interactions in stabilizing four-helix bundle proteins, Proc. Nat. Acad. Sci., 89, 7315-7319 (1992)
[22] Chou, K.-C.; Wei, D.-Q.; Zhong, W.-Z., Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS, Biochem. Biophys. Res. Commun., 308, 148-151 (2003)
[23] Cornell, W. D., A second generation force field for the simulation of proteins, nucleic acids, and organic molecules, J. Am. Chem. Soc., 117, 5179-5197 (1995)
[24] Dehzangi, A., Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC, J. Theor. Biol., 364, 284-294 (2015) · Zbl 1405.92092
[25] Ding, H., iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels, BioMed. Res. Int., 2014 (2014)
[26] Fogolari, F., Scoring predictive models using a reduced representation of proteins: model and energy definition, BMC Struct. Biol. (2007)
[27] Gohlke, H.; Hendlich, M.; Klebe, G., Knowledge-based scoring function to predict protein-ligand interactions, J. Mol. Biol., 295, 337-356 (2000)
[28] Hooft, R. W.; Sander, C.; Vriend, G., Objectively judging the quality of a protein structure from a Ramachandran plot, Comput. Appl. Biosci., 13, 425-430 (1997)
[29] Hoque, M. T., DFS generated pathways in GA Crossover for protein structure prediction, Neurocomputing, 73, 2308-2316 (2010)
[30] Hoque, Md. Tamjidul, sDFIRE: sequence-specific statistical energy function for protein structure prediction by decoy selections, J. Comput. Chem. (2016) · Zbl 1343.92385
[31] Hoque, T.; Chetty, M.; Sattar, A., Extended HP model for protein structure prediction, J. Comput. Biol., 16, 85-103 (2009)
[32] Iqbal, S.; Mishra, A.; Hoque, T., Improved prediction of accessible surface area results in efficient energy function application, J. Theor. Biol., 380, 380-391 (2015)
[33] Jernigan, R. L.; Bahar, I., Structure-derived potentials and protein simulations, Curr. Opin. Struct. Biol., 6, 195-209 (1996)
[34] Jia, J., iPPI-Esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC, J. Theor. Biol., 377, 47-56 (2015)
[35] Jia, J., Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition, J. Biomol. Struct. Dyn., 1-16 (2015)
[36] Kabsch, W.; Sander, C., Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features, Biopolymers, 22, 2577-2637 (1983)
[37] Khan, Z. U.; Hayat, M.; Khan, M. A., Discrimination of acidic and alkaline enzyme using Chou׳s pseudo amino acid composition in conjunction with probabilistic neural network model, J. Theor. Biol., 365, 197-203 (2015) · Zbl 1314.92069
[38] Koretke, K. K.; Luthey-Schulten, Z.; Wolynes, P. G., Self-consistently optimized statistical mechanical energy functions for sequence structure alignment, Protein Sci., 5, 1043-1059 (1996)
[39] Krivov, G. G.; Shapovalov, M. V.; Dunbrack, R. L., Improved prediction of protein side-chain conformations with SCWRL4, Protein: Struct. Func. Bioinf., 77, 778-795 (2009)
[40] Kumar, R., Prediction of β-lactamase and its class by Chou׳s pseudo-amino acid composition and support vector machine, J. Theor. Biol., 365, 96-103 (2015) · Zbl 1314.92055
[42] Lehninger, A. L.; Nelson, D. L.; Cox, M. M., Principles of Biochemistry (2005), W.H. Freeman and Company: W.H. Freeman and Company New York, USA
[43] Lesk, A. M., Introduction to Protein Science, 310 (2004), Oxford University Press: Oxford University Press New York
[45] Lin, H., iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition, Nucl. Acids Res., 42, 12961-12972 (2014)
[46] Lodish, H., Molecular Cell Biology (1990), Scientific American Books, W.H. Freeman: Scientific American Books, W.H. Freeman New York, USA
[47] Mandal, M.; Mukhopadhyay, A.; Maulik, U., Prediction of protein subcellular localization by incorporating multiobjective PSO-based feature subset selection into the general form of Chou׳s PseAAC, Med. Biol. Eng. Comput., 53, 331-344 (2015)
[49] Mitchell, J. B., BLEEP—potential of mean force describing protein-ligand interactions: II. Calculation of binding energies and comparison with experimental data, J. Comput. Chem., 20, 1177-1185 (1999)
[50] Mitchell, J. B., BLEEP—potential of mean force describing protein-ligand interactions: I. generating potential, J. Comput. Chem., 20, 1165-1176 (1999)
[51] Muegge, I.; Martin, Y. C., A general and fast scoring function for protein-ligand interactions: a simplified potential approach, J. Med. Chem., 42, 791-804 (1999)
[52] OuYang, B., Unusual architecture of the p7 channel from hepatitis C virus, Nature, 498, 521-525 (2013)
[53] Park, B.; Levitt, M., Energy functions that discriminate X-ray and near-native folds from well-constructed decoys, J. Mol. Biol., 258, 367-392 (1996)
[55] Ramachandran, G. N.; Ramachandran, C.; Sasisekharan, V., Stereochemistry of polypeptide chain configurations, J. Mol. Biol., 7, 95-99 (1963)
[57] Samudrala, R.; Moult, J., An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction, J. Mol. Biol., 275, 895-916 (1997)
[58] Shen, H.-B.; Chou, K.-C., A fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells, Biopolymers, 85, 233-240 (2007)
[59] Simons, K. T., Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions, J. Mol. Biol., 268, 209-225 (1997)
[60] Tanaka, S.; Scheraga, H. A., Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins, Macromolecules, 9, 945-950 (1976)
[61] Tobi, D.; Elber, R., Distance-dependent, pair potential for protein folding: results from linear optimization, Protein: Struct. Funct. Bioinf., 41, 40-46 (2000)
[62] Tsai, J., An improved protein decoy set for testing energy functions for protein structure prediction, Protein: Struct. Funct. Bioinf., 53, 76-87 (2003)
[63] Wang, S.-Q., Insights from investigating the interaction of oseltamivir (Tamiflu) with neuraminidase of the 2009 H1N1 swine flu virus, Biochem. Biophys. Res. Commun., 386, 432-436 (2009)
[64] Wolfgang, K.; Christian, S., Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features, Biopolymers, 22, 12, 2577-2637 (1983)
[65] Xu, Y., iNitro-Tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition, Plos One, 9 (2014), p. e105018
[66] Yang, J.; Zhang, Y., I-TASSER server: new development for protein structure and function predictions, Nucl. Acids Res., 43, W174-W181 (2015)
[67] Yang, Y.; Zhou, Y., Specific interactions for ab initio folding of protein terminal regions with secondary structures, Proteins, 72, 793-803 (2008)
[68] Zhang, C., A knowledge-based energy function for protein-ligand, protein-protein, and protein-DNA complexes, J. Med. Chem., 48, 2325-2335 (2005)
[69] Zhang, J.; Zhang, Y., A novel side-chain orientation dependent potential derived from random-walk reference state for protein fold selection and structure prediction, Plos One, 5, 10 (2010)
[70] Zhou, H.; Zhou, Y., Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction, Protein Sci., 11, 2714-2726 (2002)
[71] Zhou, H.; Skolnick, J., GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction, Biophys. J., 101, 2043-2052 (2011)
[72] Zi Liu, iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition, Anal. Biochem., 474, 69-77 (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.