×

Novel 3D bio-macromolecular bilinear descriptors for protein science: predicting protein structural classes. (English) Zbl 1341.92053

Summary: In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the \(\mathbb R^n\) space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to \(\mathbb R^n\) space, whose components represent certain amino acid side-chain properties, were used as weighting schemes. Generalization approaches for the calculation of inter-amino acidic residue spatial distances based on Minkowski metrics are proposed. The simple- and double-stochastic schemes were defined as approaches to normalize the coulombic matrix. The local-fragment indices for both amino acid-types and amino acid-groups are presented in order to permit characterizing fragments of interest in proteins. On the other hand, with the objective of taking into account specific interactions among amino acids in global or local indices, geometric and topological cut-offs are defined. To assess the utility of global and local indices a classification model for the prediction of the major four protein structural classes, was built with the linear discriminant analysis (LDA) technique. The developed LDA-model correctly classifies the 92.6% and 92.7% of the proteins on the training and test sets, respectively. The obtained model showed high values of the generalized square correlation coefficient (GC2) on both the training and test series. The statistical parameters derived from the internal and external validation procedures demonstrate the robustness, stability and the high predictive power of the proposed model. The performance of the LDA-model demonstrates the capability of the proposed indices not only to codify relevant biochemical information related to the structural classes of proteins, but also to yield suitable interpretability. It is anticipated that the current method will benefit the prediction of other protein attributes or functions.

MSC:

92D20 Protein sequences, DNA sequences
92C40 Biochemistry, molecular biology
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Althaus, I. W.; Chou, J. J.; Gonzales, A. J.; Deibel, M. R.; Chou, K. C.; Kezdy, F. J.; Romero, D. L.; Palmer, J. R.; Thomas, R. C., Kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-88204E, Biochemistry, 32, 6548-6554 (1993)
[2] Balaban, A. T., Local versus global (i.e. atomic versus molecular) numerical modeling of molecular graphs, J. Chem. Inf. Comput. Sci., 34, 398 (1994)
[3] Baldi, P.; Brunak, S.; Chauvin, Y.; Andersen, C. A.; Nielsen, H., Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics, 16, 412-424 (2000)
[4] Barigye, S. J.; Marrero-Ponce, Y.; Pérez-Giménez, F.; Bonchev, D., Trends in information theory based chemical structure codification, Mol. Divers., 18, 673-686 (2014)
[5] Cai, Y.-D.; Chou, K.-C., Predicting membrane protein type by functional domain composition and pseudo-amino acid composition, J. Theor. Biol., 238, 395-400 (2006) · Zbl 1445.92219
[6] Cai, Y.-D.; Hu, J.; Liu, X.; Chou, K.-C., Prediction of protein structural classes by neural network method, J. Mol. Des., 1, 332-338 (2002)
[7] Cai, Y.-D.; Liu, X.-J.; Xu, X.-b.; Chou, K.-C., Prediction of protein structural classes by support vector machines, Comput. Chem., 26, 293-296 (2002)
[8] Cai, Y.-D.; Feng, K.-Y.; Lu, W.-C.; Chou, K.-C., Using LogitBoost classifier to predict protein structural classes, J. Theor. Biol., 238, 172-176 (2006) · Zbl 1445.92220
[9] Carbo-Dorca, R., Stochastic transformation of quantum similarity matrixes and their use in quantum QSAR (QQSAR) models, Int. J. Quantum Chem., 79, 163-177 (2000)
[10] Collantes, E. R.; Dunn, W. J., Amino acid side chain descriptors for quantitative structure-activity relationship studies of peptide analogs, J. Med. Chem., 38, 2705-2713 (1995)
[11] Chen, C.; Chen, L.-X.; Zou, X.-Y.; Cai, P.-X., Predicting protein structural class based on multi-features fusion, J. Theor. Biol., 253, 388-392 (2008) · Zbl 1398.92196
[12] Chen, C.; Tian, Y.-X.; Zou, X.-Y.; Cai, P.-X.; Mo, J.-Y., Using pseudo-amino acid composition and support vector machine to predict protein structural class, J. Theor. Biol., 243, 444-448 (2006) · Zbl 1447.92300
[13] Chen, K.; Kurgan, L. A.; Ruan, J., Prediction of protein structural class using novel evolutionary collocation-based sequence representation, J. Comput. Chem., 29, 1596-1604 (2008)
[15] Chou, K.-C., Energy-optimized structure of antifreeze protein and its binding mechanism, J. Mol. Biol., 223, 509-517 (1992)
[16] Chou, K.-C., A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space, Proteins: Struct. Funct. Bioinf., 21, 319-344 (1995)
[17] Chou, K.-C., A key driving force in determination of protein structural classes, Biochem. Biophys. Res. Commun., 264, 216-224 (1999)
[18] Chou, K.-C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins: Struct. Funct. Bioinf., 43, 246-255 (2001)
[19] Chou, K.-C., Progress in protein structural class prediction and its impact to bioinformatics and proteomics, Curr. Protein Pept. Sci., 6, 423-436 (2005)
[20] Chou, K.-C., Graphic rule for drug metabolism systems, Curr. Drug Metab., 11, 369-378 (2010)
[21] Chou, K.-C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247 (2011) · Zbl 1405.92212
[22] Chou, K.-C.; Zhang, C.-T., Predicting protein folding types by distance functions that make allowances for amino acid interactions, J. Biol. Chem., 269, 22014-22020 (1994)
[23] Chou, K.-C.; Cai, Y.-D., Predicting protein structural class by functional domain composition, Biochem. Biophys. Res. Commun., 321, 1007-1009 (2004)
[24] Chou, K.-C.; Shen, H.-B., Recent progress in protein subcellular location prediction, Anal. Biochem., 370, 1-16 (2007)
[25] Chou, K.-C.; Zhang, C.-T.; Maggiora, G. M., Disposition of amphiphilic helices in heteropolar environments, Proteins: Struct. Funct. Genet., 28, 99-108 (1997)
[26] Chou, K.-C.; Lin, W.-Z.; Xiao, X., Wenxiang: a web-server for drawing wenxiang diagrams, Nat. Sci., 3, 862 (2011)
[27] Chou, K. C., Prediction of protein structural classes and subcellular locations, Curr. Protein Pept. Sci., 1, 171-208 (2000)
[28] Di Paola, L.; De Ruvo, M.; Paci, P.; Santoni, D.; Giuliani, A., Protein contact networks: an emerging paradigm in chemistry, Chem. Rev., 113, 1598-1613 (2012)
[29] Ding, Y.-S.; Zhang, T.-L.; Chou, K.-C., Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network, Protein Pept. Lett., 14, 811-815 (2007)
[30] Edwards, C. H.; Penney, D. E., Elementary Linear Algebra (1988), Prentice-Hall: Prentice-Hall Englewoods Cliffs
[31] Eriksson, L.; Jaworska, J.; Worth, A. P.; Cronin, M. T.; McDowell, R. M.; Gramatica, P., Methods for reliability and uncertainty assessment and for applicability evaluations of classification-and regression-based QSARs, Environ. Health Perspect., 111, 1361 (2003)
[32] Estrada, E., Characterization of the folding degree of proteins, Bioinformatics, 18, 697-704 (2002)
[33] García-Jacas, C. R.; Marrero-Ponce, Y.; Barigye, S. J.; Valdés-Martiní, J. R.; Rivera-Borroto, O. M.; Olivero-Verbel, J., N-linear algebraic maps for chemical structure codification: a suitable generalization for atom-pair approaches?, Curr. Drug Metab., 15, 441-469 (2014)
[34] García-Jacas, C. R.; Marrero-Ponce, Y.; Acevedo-Martínez, L.; Barigye, S. J.; Valdés-Martiní, J. R.; Contreras-Torres, E., J. Comput. Chem., 35, 1395-1409 (2014)
[35] García-Jacas, C. R.; Aguilera-Mendoza, L.; González-Pérez, R.; Marrero-Ponce, Y.; Acevedo-Martínez, L.; Barigye, S. J.; Avdeenko, T., Multi-server approach for high-throughput molecular descriptors calculation based on multi-linear algebraic maps, Mol. Inf., 34, 60-69 (2015)
[36] Golbraikh, A.; Tropsha, A., Beware of q2!, J. Mol. Graph. Modell., 20, 269-276 (2002)
[37] González-Díaz, H.; Uriarte, E., Proteins QSAR with Markov average electrostatic potentials, Bioorg. Med. Chem. Lett., 15, 5088-5094 (2005)
[38] González, D.; De Armas, R. R.; Uriarte, E., In silico Markovian bioinformatics for predicting 1Ha-NMR chemical shifts in mouse epidermis growth factor (mEGF), Online J. Bioinform., 1, 83-95 (2002)
[39] González Dı́az, H.; Molina, R.; Uriarte, E., Stochastic molecular descriptors for polymers. 1. Modelling the properties of icosahedral viruses with 3D-Markovian negentropies, Polymer, 45, 3845-3853 (2004)
[40] Gramatica, P., Principles of QSAR models validation: internal and external, QSAR Comb. Sci., 26, 694-701 (2007)
[41] Gromiha, M.; Saraboji, K.; Ahmad, S.; Ponnuswamy, M.; Suwa, M., Role of non-covalent interactions for determining the folding rate of two-state proteins, Biophys. Chem., 107, 263-272 (2004)
[42] Gromiha, M. M., Importance of native-state topology for determining the folding rate of two-state proteins, J. Chem. Inf. Comput. Sci., 43, 1481-1485 (2003)
[43] Gromiha, M. M.; Selvaraj, S., Comparison between long-range interactions and contact order in determining the folding rate of two-state proteins: application of long-range order to folding rate prediction, J. Mol. Biol., 310, 27-32 (2001)
[44] Guo, S.-H.; Deng, E.-Z.; Xu, L.-Q.; Ding, H.; Lin, H.; Chen, W.; Chou, K.-C., iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo \(k\)-tuple nucleotide composition, Bioinformatics (2014), btu083
[45] Hellberg, S.; Sjoestroem, M.; Skagerberg, B.; Wold, S., Peptide quantitative structure-activity relationships, a multivariate approach, J. Med. Chem., 30, 1126-1135 (1987)
[46] Hopp, T. P.; Woods, K. R., Prediction of protein antigenic determinants from amino acid sequences, Proc. Natl. Acad. Sci. USA, 78, 3824-3828 (1981)
[47] Kar, A., Medicinal Chemistry (2007), New Age International (P) Ltd., Publishers: New Age International (P) Ltd., Publishers New Delhi
[48] Kong, L.; Zhang, L.; Lv, J., Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou׳s pseudo amino acid composition, J. Theor. Biol., 344, 12-18 (2014) · Zbl 1412.92246
[49] Kyte, J.; Doolittle, R. F., A simple method for displaying the hydropathic character of a protein, J. Mol. Biol., 157, 105-132 (1982)
[50] Lehninger, A.; Nelson, D. L.; Cox, M. M., Lehninger׳s Principles of Biochemistry (2005), WH Freeman and Company: WH Freeman and Company New York
[51] Levitt, M., Conformational preferences of amino acids in globular proteins, Biochemistry, 17, 4277-4285 (1978)
[52] Levitt, M.; Chothia, C., Structural patterns in globular proteins, Nature, 261, 552-558 (1976)
[53] Li, Z.-C.; Zhou, X.-B.; Dai, Z.; Zou, X.-Y., Prediction of protein structural classes by Chou׳s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis, Amino Acids, 37, 415-425 (2009)
[54] Lin, H.; Li, Q.-Z., Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components, J. Comput. Chem., 28, 1463-1466 (2007)
[55] Lin, S.-X.; Lapointe, J., J. Biomed. Sci. Eng., Theoretical and experimental biology in one-A symposium in honour of Professor Kuo-Chen Chou׳s 50th anniversary and Professor Richard Giegé׳s 40th anniversary of their scientific careers, 6, 435-442 (2013)
[56] Liu, B.; Wang, X.; Chen, Q.; Dong, Q.; Lan, X., Using amino acid physicochemical distance transformation for fast protein remote homology detection, PLoS One, 7, e46633 (2012)
[57] Liu, B.; Wang, X.; Zou, Q.; Dong, Q.; Chen, Q., Protein remote homology detection by combining Chou׳s pseudo amino acid composition and profile-based protein representation, Mol. Inf., 32, 775-782 (2013)
[58] Liu, B.; Liu, F.; Fang, L.; Wang, X.; Chou, K.-C., repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects, Bioinformatics (2014), btu820
[59] Liu, B.; Xu, J.; Zou, Q.; Xu, R.; Wang, X.; Chen, Q., Using distances between Top-n-gram and residue pairs for protein remote homology detection, BMC Bioinform., 15, S3 (2014)
[60] Liu, B.; Xu, J.; Fan, S.; Xu, R.; Zhou, J.; Wang, X., PseDNA-Pro: DNA-binding protein identification by combining Chou׳s PseAAC and physicochemical distance transformation, Mol. Inf., 34, 8 (2014)
[61] Liu, B.; Xu, J.; Lan, X.; Xu, R.; Zhou, J.; Wang, X.; Chou, K.-C., iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition, PLoS One, 9, e106691 (2014)
[62] Liu, B.; Zhang, D.; Xu, R.; Xu, J.; Wang, X.; Chen, Q.; Dong, Q.; Chou, K.-C., Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection, Bioinformatics, 30, 472-479 (2014)
[63] Liu, W.-m.; Chou, K.-C., Prediction of protein structural classes by modified Mahalanobis discriminant algorithm, J. Protein Chem., 17, 209-217 (1998)
[64] Marrero-Ponce, Y.; Huesca-Guillén, A.; Ibarra-Velarde, F., Quadratic indices of the molecular pseudograph׳s atom adjacency matrix and their stochastic forms: a novel approach for virtual screening and in silico discovery of new lead paramphistomicide drugs-like compounds, J. Mol. Struct.: THEOCHEM, 717, 67-79 (2005)
[65] Marrero-Ponce, Y.; Castillo-Garit, J. A.; Castro, E. A.; Torrens, F.; Rotondo, R., 3D-chiral (2.5) atom-based TOMOCOMD-CARDD descriptors: theory and QSAR applications to central chirality codification, J. Math. Chem., 44, 755-786 (2008) · Zbl 1217.92094
[66] Marrero-Ponce, Y.; Medina-Marrero, R.; Castillo-Garit, J. A.; Romero-Zaldivar, V.; Torrens, F.; Castro, E. A., Protein linear indices of the ‘macromolecular pseudograph α-carbon atom adjacency matrix’ in bioinformatics. Part 1: prediction of protein stability effects of a complete set of alanine substitutions in Arc repressor, Bioorg. Med. Chem., 13, 3003-3015 (2005)
[67] Marrero-Ponce, Y.; Marrero, R.; Castro, E.; Ramos de Armas, R.; González-Díaz, H.; Romero Zaldivar, V.; Torrens, F., Protein quadratic indices of the “macromolecular pseudograph׳s α-carbon atom adjacency matrix”. 1. Prediction of arc repressor alanine-mutant׳s stability, Molecules, 9, 1124-1147 (2004)
[68] Marrero-Ponce, Y.; García-Jacas, C. R.; Barigye, S. J.; Valdés-Martiní, J. R.; Rivera-Borroto, O. M.; Pino-Urias, R. W.; Cubillán, N.; Alvarado, Y. J., Optimum search strategies or novel 3D molecular descriptors: is there a stalemate?, Curr. Bioinf. (2015), (in press)
[69] Mathews, C. K.; van Holde, K. E.; Ahern, K. G., Biochemistry (2000), Benjamin Cummings: Benjamin Cummings San Francisco
[70] McFarland, J.; Gans, D., Linear discriminant analysis and cluster significance analysis, Compr. Med. Chem., 4, 667-689 (1990)
[71] Moreau, G.; Broto, P., The auto-correlation of a topological-structure—a new molecular descriptor, Nouv. J. Chim.-New J. Chem., 4, 359-360 (1980)
[72] Ortega-Broche, S. E.; Marrero-Ponce, Y.; Díaz, Y. E.; Torrens, F.; Pérez-Giménez, F., Tomocomd-camps and protein bilinear indices-novel bio-macromolecular descriptors for protein research: I. Predicting protein stability effects of a complete set of alanine substitutions in the Arc repressor, FEBS J., 277, 3118-3146 (2010)
[73] Plaxco, K. W.; Simons, K. T.; Baker, D., Contact order, transition state placement and the refolding rates of single domain proteins, J. Mol. Biol., 277, 985-994 (1998)
[74] Ramos de Armas, R.; González Díaz, H.; Molina, R.; Uriarte, E., Markovian backbone negentropies: molecular descriptors for protein research. I. Predicting protein stability in arc repressor mutants, Proteins: Struct. Funct. Bioinf., 56, 715-723 (2004)
[75] Ramos de Armas, R.; González Díaz, H.; Molina, R.; Pérez González, M.; Uriarte, E., Stochastic-based descriptors studying peptides biological properties: modeling the bitter tasting threshold of dipeptides, Bioorg. Med. Chem., 12, 4815-4822 (2004)
[76] Randic, M.; Zupan, J.; Balaban, A. T.; Vikić-Topić, D.; Plavšić, D., Graphical representation of proteins†, Chem. Rev., 111, 790-862 (2010)
[77] Randić, M.; Mehulić, K.; Vukičević, D.; Pisanski, T.; Vikić-Topić, D.; Plavšić, D., Graphical representation of proteins as four-color maps and their numerical characterization, J. Mol. Graph. Modell., 27, 637-641 (2009)
[78] Rao, H.; Zhu, F.; Yang, G.; Li, Z.; Chen, Y., Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence, Nucleic Acids Res., 39, W385-W390 (2011)
[79] Ruiz-Blanco, Y. B.; García, Y.; Sotomayor-Torres, C.; Marrero-Ponce, Y., New set of 2D/3D thermodynamic indices for proteins. A formalism based on the Molten Globule theory, Phys. Procedia, 8, 63-72 (2010)
[80] Sak, K.; Karelson, M.; Järv, J., Modeling of the amino acid side chain effects on peptide conformation, Bioorg. Chem., 27, 434-442 (1999)
[81] Shen, H.-B.; Yang, J.; Liu, X.-J.; Chou, K.-C., Using supervised fuzzy clustering to predict protein structural classes, Biochem. Biophys. Res. Commun., 334, 577-581 (2005)
[82] Sinkhorn, R.; Knopp, P., Concerning nonnegative matrices and doubly stochastic matrices, Pac. J. Math., 21, 343-348 (1967) · Zbl 0152.01403
[83] Todeschini, R.; Consonni, V., Molecular Descriptors for Chemoinformatics (2009), Wiley-VCH: Wiley-VCH Weinheim
[84] Todeschini, R.; Consonni, V., New local vertex invariants and molecular descriptors based on functions of the vertex degrees, MATCH Commun. Math. Comput. Chem., 64, 359-372 (2010)
[85] Tropsha, A.; Gramatica, P.; Gombar, V. K., The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models, QSAR Comb. Sci., 22, 69-77 (2003)
[86] Wu, Z.-C.; Xiao, X.; Chou, K.-C., 2D-MH: a web-server for generating graphic representation of protein sequences based on the physicochemical properties of their constituent amino acids, J. Theor. Biol., 267, 29-34 (2010) · Zbl 1410.92089
[87] Xiao, X.; Wang, P.; Chou, K.-C., Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image, J. Theor. Biol., 254, 691-696 (2008) · Zbl 1400.92416
[88] Xiao, X.; Lin, W.-Z.; Chou, K.-C., Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes, J. Comput. Chem., 29, 2018-2024 (2008)
[89] Xiao, X.; Shao, S.-H.; Huang, Z.-D.; Chou, K.-C., Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor, J. Comput. Chem., 27, 478-482 (2006)
[90] Zamyatnin, A., Protein volume in solution, Prog. Biophys. Mol. Biol., 24, 107-123 (1972)
[91] Zhang, L.; Zhao, X.; Kong, L., Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou׳s pseudo amino acid composition, J. Theor. Biol., 355, 105-110 (2014)
[92] Zhang, T.-L.; Ding, Y.-S.; Chou, K.-C., Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern, J. Theor. Biol., 250, 186-193 (2008) · Zbl 1397.92551
[93] Zhou, G.-P., The disposition of the LZCC protein residues in wenxiang diagram provides new insights into the protein-protein interaction mechanism, J. Theor. Biol., 284, 142-148 (2011) · Zbl 1397.92245
[94] Zhou, G.; Deng, M., An extension of Chou׳s graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways, Biochem. J., 222, 169 (1984)
[95] Zhou, H.; Zhou, Y., Folding rate prediction using total contact distance, Biophys. J., 82, 458-463 (2002)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.