zbMATH — the first resource for mathematics

Species specific amino acid sequence-protein local structure relationships: an analysis in the light of a structural alphabet. (English) Zbl 1405.92224
Summary: Protein structure analysis and prediction methods are based on non-redundant data extracted from the available protein structures, regardless of the species from which the protein originates. Hence, these datasets represent the global knowledge on protein folds, which constitutes a generic distribution of amino acid sequence-protein structure (AAS-PS) relationships. In this study, we try to elucidate whether the AAS-PS relationship could possess specificities depending on the specie.
For this purpose, we have chosen three different species: Saccharomyces cerevisiae, Plasmodium falciparum and Arabidopsis thaliana. We analyzed the AAS-PS behaviors of the proteins from these three species and compared it to the “expected” distribution of a classical non-redundant databank. With the classical secondary structure description, only slight differences in amino acid preferences could be observed. With a more precise description of local protein structures (protein blocks), significant changes could be highlighted.
S. cerevisiae’s AAS-PS relationship is close to the general distribution, while striking differences are observed in the case of A. thaliana. P. falciparum is the most distant one.
This study presents some interesting view-points on AAS-PS relationship. Certain species exhibit unique preferences for amino acids to be associated with protein local structural elements. Thus, AAS-PS relationships are species dependent. These results can give useful insights for improving prediction methodologies which take the species specific information into account.
92D20 Protein sequences, DNA sequences
Full Text: DOI
[1] Altschul, S.F.; Madden, T.L.; Schaffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic acids res., 25, 3389-3402, (1997)
[2] Anamika, Srinivasan, N.; Krupa, A., A genomic perspective of protein kinases in plasmodium falciparum, Proteins, 58, 180-189, (2005), doi:10.1002/prot.20278
[3] Aravind, L.; Iyer, L.M.; Wellems, T.E.; Miller, L.H., Plasmodium biology: genomic gleanings, Cell, 115, 771-785, (2003), doi:S0092867403010237 [pii]
[4] Aurrecoechea, C.; Brestelli, J.; Brunk, B.P.; Dommer, J.; Fischer, S.; Gajria, B.; Gao, X.; Gingle, A.; Grant, G.; Harb, O.S.; Heiges, M.; Innamorato, F.; Iodice, J.; Kissinger, J.C.; Kraemer, E.; Li, W.; Miller, J.A.; Nayak, V.; Pennington, C.; Pinney, D.F.; Roos, D.S.; Ross, C.; Stoeckert, C.J.; Treatman, C.; Wang, H., Plasmodb: a functional genomic database for malaria parasites., Nucleic acids res., 37, D539-D543, (2009), doi:gkn814 [pii]10.1093/nar/gkn814
[5] Bastien, O.; Roy, S.; Marechal, E., Construction of non-symmetric substitution matrices derived from proteomes with biased amino acid distributions, C R biol., 328, 445-453, (2005)
[6] Benros, C.; de Brevern, A.G.; Hazout, S, Analyzing the sequence – structure relationship of a library of local structural prototypes., J. theor. biol., 256, 215-226, (2009), doi:S0022-5193(08)00465-7 [pii], 10.1016/j.jtbi.2008.08.032 · Zbl 1400.92388
[7] Benros, C.; de Brevern, A.G.; Etchebest, C.; Hazout, S., Assessing a novel approach for predicting local 3D protein structures from sequence., Proteins, 62, 865-880, (2006)
[8] Berman, H.; Henrick, K.; Nakamura, H., Announcing the worldwide protein data bank, Nat. struct. biol., 10, 980, (2003), doi:10.1038/nsb1203-980 nsb1203-980 [pii]
[9] Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E., The protein data bank, Nucleic acids res., 28, 235-242, (2000)
[10] Birkholtz, L.; van Brummelen, A.C.; Clark, K.; Niemand, J.; Marechal, E.; Llinas, M.; Louw, A.I, Exploring functional genomics for drug target and therapeutics discovery in plasmodia, Acta trop., 105, 113-123, (2008), doi:S0001-706X(07)00262-8 [pii]10.1016/j.actatropica.2007.10.013.
[11] Biswas, A.K.; Noman, N.; Sikder, A.R., Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information, BMC bioinformatics, 11, 273, (2010), doi:1471-2105-11-273 [pii]10.1186/1471-2105-11-273
[12] Bornot, A.; Etchebest, C.; de Brevern, A.G., A new prediction strategy for long local protein structures using an original description., Proteins, 76, 570-587, (2009), doi:10.1002/prot.22370
[13] Bornot, A., Etchebest, C., de Brevern, A.G., 2011. Predicting protein flexibility through the prediction of local structures. Proteins, 79 (3), 839-852
[14] Brick, K.; Pizzi, E., A novel series of compositionally biased substitution matrices for comparing plasmodium proteins, BMC bioinformatics, 9, 236, (2008), doi:1471-2105-9-236 [pii]10.1186/1471-2105-9-236
[15] Brylinski, M.; Skolnick, J., A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation, Proc. natl. acad. sci. U S A, 105, 129-134, (2008), doi:0707684105 [pii]10.1073/pnas.0707684105
[16] Congreve, M.; Murray, C.W.; Blundell, T.L., Structural biology and drug discovery, Drug discov today, 10, 895-907, (2005), doi:S1359-6446(05)03484-7 [pii], 10.1016/S1359-6446(05)03484-7
[17] de Brevern, A.G., New assessment of a structural alphabet, Silico. biol., 5, 283-289, (2005)
[18] de Brevern, A.G., 3D structural models of transmembrane proteins, Methods mol. biol., 654, 387-401, (2010), doi:10.1007/978-1-60761-762-4_20
[19] de Brevern, A.G.; Hazout, S., Compacting local protein folds with a “hybrid protein model”, Theo. chem. acc., 106, 36-47, (2001)
[20] de Brevern, A.G.; Hazout, S., Hybrid protein model for optimally defining 3D protein structure fragments, Bioinformatics, 19, 345-353, (2003)
[21] de Brevern, A.G.; Etchebest, C.; Hazout, S., Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks, Proteins, 41, 271-287, (2000)
[22] de Brevern, A.G.; Valadie, H.; Hazout, S.; Etchebest, C., Extension of a local backbone description using a structural alphabet: a new approach to the sequence – structure relationship, Protein sci., 11, 2871-2886, (2002)
[23] de dBrevern, A.G.; Etchebest, C.; Benros, C.; Hazout, S., Pinning strategy: a novel approach for predicting the backbone structure in terms of protein blocks from sequence. J. biosci., 32, 51-70, (2007)
[24] de Brevern, A.G.; Autin, L.; Colin, Y.; Bertrand, O.; Etchebest, C., In silico studies on DARC, Infect. disord. drug targets, 9, 289-303, (2009)
[25] de Brevern, A.G.; Benros, C.; Gautier, R.; Valadie, H.; Hazout, S.; Etchebest, C., Local backbone structure prediction of proteins, Silico. biol., 4, 381-386, (2004)
[26] DeLano, W.L.T., The pymol molecular graphics system delano scientific, (2002), San Carlos CA, USA
[27] DePristo, M.A.; Zilversmit, M.M.; Hartl, D.L., On the abundance, amino acid composition, and evolutionary dynamics of low-complexity regions in proteins, Gene, 378, 19-30, (2006), doi:S0378-1119(06)00253-8 [pii], 10.1016/j.gene.2006.03.023
[28] Dong, Q.; Wang, X.; Lin, L., Prediction of protein local structures and folding fragments based on building-block library, Proteins, 72, 353-366, (2008), doi:10.1002/prot.21931
[29] Doppelt-Azeroual, O.; Moriaud, F.; Delfaud, F.; de Brevern, A.G., Analysis of HSP90 related folds with MED-sumo classification approach., Drug design, development and therapy, 3, 59-72, (2009)
[30] Doppelt, O.; Moriaud, F.; Bornot, A.; de Brevern, A.G, Functional annotation strategy for protein structures, Bioinformation, 1, 357-359, (2007)
[31] Dudev, M.; Lim, C., Discovering structural motifs using a structural alphabet: application to magnesium-binding sites, BMC bioinformatics, 8, 106, (2007)
[32] Dumontier, M.; Michalickova, K.; Hogue, C.W., Species-specific protein sequence and fold optimizations, BMC bioinformatics, 3, 39, (2002)
[33] Etchebest, C.; Benros, C.; Hazout, S.; de Brevern, A.G, A structural alphabet for local protein structures: improved prediction methods. proteins, 59, 810-827, (2005)
[34] Etchebest, C.; Benros, C.; Bornot, A.; Camproux, A.C.; de Brevern, A.G., A reduced amino acid alphabet for understanding and designing protein adaptation to mutation., Eur. biophys. J., 36, 1059-1069, (2007)
[35] Faure, G.; Bornot, A.; de Brevern, A.G., Analysis of protein contacts into protein units, Biochimie, 91, 876-887, (2009), doi:S0300-9084(09)00106-0 [pii] 10.1016/j.biochi.2009.04.008
[36] Fourrier, L.; Benros, C.; de Brevern, A.G., Use of a structural alphabet for analysis of short loops connecting repetitive structures, BMC bioinformatics, 5, 58, (2004)
[37] Gardner, M.J.; Hall, N.; Fung, E.; White, O.; Berriman, M.; Hyman, R.W.; Carlton, J.M.; Pain, A.; Nelson, K.E.; Bowman, S.; Paulsen, I.T.; James, K.; Eisen, J.A.; Rutherford, K.; Salzberg, S.L.; Craig, A.; Kyes, S.; Chan, M.S.; Nene, V.; Shallom, S.J.; Suh, B.; Peterson, J.; Angiuoli, S.; Pertea, M.; Allen, J.; Selengut, J.; Haft, D.; Mather, M.W.; Vaidya, A.B.; Martin, D.M.; Fairlamb, A.H.; Fraunholz, M.J.; Roos, D.S.; Ralph, S.A.; McFadden, G.I.; Cummings, L.M.; Subramanian, G.M.; Mungall, C.; Venter, J.C.; Carucci, D.J.; Hoffman, S.L.; Newbold, C.; Davis, R.W.; Fraser, C.M.; Barrell, B., Genome sequence of the human malaria parasite plasmodium falciparum., Nature, 419, 498-511, (2002), doi:10.1038/nature01097 nature01097 [pii]
[38] Ghozlane, A.; Joseph, A.P.; Bornot, A.; de Brevern, A.G., Analysis of protein chameleon sequence characteristics, Bioinformation, 8, (2009)
[39] Hajduk, P.J., Puzzling through fragment-based drug design., Nat. chem. biol., 2, 658-659, (2006), doi:nchembio1206-658 [pii] 10.1038/nchembio1206-658
[40] Hunter, C.G.; Subramaniam, S., Protein local structure prediction from sequence, Proteins, 50, 572-579, (2003)
[41] Illergard, K.; Callegari, S.; Elofsson, A., MPRAP: an accessibility predictor for a-helical transmembrane proteins that performs well inside and outside the membrane, BMC bioinformatics, 11, 333, (2010), doi:1471-2105-11-333 [pii] 10.1186/1471-2105-11-333
[42] Jones, D.T., Protein secondary structure prediction based on position-specific scoring matrices, J. mol. biol., 292, 195-202, (1999)
[43] Joseph, A.P.; Bornot, A.; de Brevern, A.G., Local structure alphabets., (), in press
[44] Joseph, A.P.; Agarwal, G.; Mahajan, S.; Gelly, J.-C.; Swapna, L.S.; Offmann, B.; Cadet, F.; Bornot, A.; Tyagi, M.; Valadié, H.; Schneider, B.; Cadet, F.; Srinivasan, N.; de Brevern, A.G, A short survey on protein blocks, Biophys. rev., 2, 137-145, (2010)
[45] Joubert, Y.; Joubert, F., A structural annotation resource for the selection of putative target proteins in the malaria parasite., Malar. J., 7, 90, (2008), doi:1475-2875-7-90 [pii] 10.1186/1475-2875-7-90
[46] Kabsch, W.; Sander, C., Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features, Biopolymers, 22, 2577-2637, (1983)
[47] Karchin, R., 2003. Evaluating local structure alphabets for protein structure prediction (Ph.D.) pp. 301.
[48] Karchin, R.; Cline, M.; Mandel-Gutfreund, Y.; Karplus, K., Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry, Proteins, 51, 504-514, (2003)
[49] Kelley, L.A.; Sternberg, M.J., Protein structure prediction on the web: a case study using the phyre server., Nat. protoc., 4, 363-371, (2009), doi:nprot.2009.2 [pii] 10.1038/nprot.2009.2
[50] Kohonen, T., Self-organizing maps, (2001), Springer · Zbl 0957.68097
[51] Kulangara, C.; Kajava, A.V.; Corradin, G.; Felger, I., Sequence conservation in plasmodium falciparum alpha-helical coiled coil domains proposed for vaccine development, Plos one, 4, e5419, (2009), doi:10.1371/journal.pone.0005419
[52] Kyrpides, N.C., Genomes online database (GOLD 1.0): a monitor of complete and ongoing genome projects world-wide, Bioinformatics, 15, 773-774, (1999)
[53] Lattman, E., The state of the protein structure initiative, Proteins, 54, 611-615, (2004), doi:10.1002/prot.20000
[54] Li, Q.; Zhou, C.; Liu, H., Fragment-based local statistical potentials derived by combining an alphabet of protein local structures with secondary structures and solvent accessibilities, Proteins, 74, 820-836, (2009), doi:10.1002/prot.22191
[55] Liolios, K.; Mavromatis, K.; Tavernarakis, N.; Kyrpides, N.C., The genomes on line database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata, Nucleic acids res, 36, D475-D479, (2008)
[56] Madera, M.; Calmus, R.; Thiltgen, G.; Karplus, K.; Gough, J., Improving protein secondary structure prediction using a simple k-mer model., Bioinformatics, 26, 596-602, (2010), doi:btq020 [pii] 10.1093/bioinformatics/btq020
[57] Marashi, S.A.; Behrouzi, R.; Pezeshk, H., Adaptation of proteins to different environments: a comparison of proteome structural properties in bacillus subtilis and Escherichia coli, J. theor. biol., 244, 127-132, (2007), doi:10.1016/j.jtbi.2006.07.021
[58] Martin, J.; de Brevern, A.G.; Camproux, A.C, In silico local structure approach: a case study on outer membrane proteins, Proteins, 71, 92-109, (2008), doi:10.1002/prot.21659
[59] Moriaud, F., Doppelt-Azeroual, O., Martin, L., Oguievetskaia, K., Koch, K., Vorotyntsev, A., Adcock, S.A., Delfaud, F., 2009. Computational fragment-based approach at PDB scale by protein local similarity. J. Chem. Inf. Model, doi:10.1021/ci8003094 [pii].
[60] Nidhi, T.; Swapna, L.S.; Mohanty, S.; Agarwal, G.; Gowri, V.S.; Anamika, K.; Priya, M.L.; Krishnadev, O.; Srinivasan, N., Evolutionary divergence of plasmodium falciparum: sequences, protein – protein interactions, pathways and processes, Infect. dis.—drug targets, 3, (2009)
[61] Offmann, B.; Tyagi, M.; de Brevern, A.G., Local protein structures., Current bioinformatics, 3, 165-202, (2007)
[62] Paila, U.; Kondam, R.; Ranjan, A., Genome bias influences amino acid choices: analysis of amino acid substitution and re-compilation of substitution matrices exclusive to an AT-biased genome, Nucleic acids res., 36, 6664-6675, (2008), doi:gkn635 [pii] 10.1093/nar/gkn635
[63] Pauling, L.; Corey, R.B., The pleated sheet, a new layer configuration of polypeptide chains, Proc. natl. acad. sci. U S A, 37, 251-256, (1951)
[64] Pauling, L.; Corey, R.B.; Branson, H.R., The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain, Proc. natl. acad. sci. U S A, 37, 205-211, (1951)
[65] Pollastri, G.; Przybylski, D.; Rost, B.; Baldi, P., Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles, Proteins, 47, 228-235, (2002)
[66] Pylouster, J.; Bornot, A.; Etchebest, C.; de Brevern, A.G., Influence of assignment on the prediction of transmembrane helices in protein structures., Amino acids, 39, 1241-1254, (2010), doi:10.1007/s00726-010-0559-6
[67] Rabiner, L.R., A tutorial on hidden Markov models and selected application in speech recognition, Proc. the IEEE, 77, 257-286, (1989)
[68] Rangwala, H.; Kauffman, C.; Karypis, G., Svmprat: SVM-based protein residue annotation toolkit., BMC bioinformatics, 10, 439, (2009), doi:1471-2105-10-439 [pii] 10.1186/1471-2105-10-439
[69] Salzemann, J.; Botha, M.; Dacosta, A.; Degliesposti, G.; Isea, R.; Kim, D.; Maass, A.; Kenyon, C.; Rastelli, G.; Hofmann-Apitius, M.; Breton, V., WISDOM-II: screening against multiple targets implicated in malaria using computational grid infrastructures, Malar. J., 8, 88, (2009)
[70] Sander, O.; Sommer, I.; Lengauer, T., Local protein structure prediction using discriminative models, BMC bioinformatics, 7, 14, (2006)
[71] Service, R.F., Structural biology, Protein structure initiative: phase 3 or phase out. science, 319, 1610-1613, (2008), doi:319/5870/1610 [pii] 10.1126/science.319.5870.1610.
[72] Singer, G.A.; Hickey, D.A., Nucleotide bias causes a genomewide bias in the amino acid composition of proteins, Mol. biol. evol., 17, 1581-1588, (2000)
[73] Thomas, A.; Deshayes, S.; Decaffmeyer, M.; Van Eyck, M.H.; Charloteaux, B.; Brasseur, R., Prediction of peptide structure: how far are we?, Proteins, 65, 889-897, (2006)
[74] Tung, C.H.; Huang, J.W.; Yang, J.M., Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database, Genome biol., 8, R31, (2007)
[75] Tyagi, M.; de Brevern, A.G.; Srinivasan, N.; Offmann, B., Protein structure mining using a structural alphabet., Proteins, 71, 920-937, (2008), doi:10.1002/prot.21776
[76] Tyagi, M.; Bornot, A.; Offmann, B.; de Brevern, A.G., Analysis of loop boundaries using different local structure assignment methods., Protein sci., 18, 1869-1881, (2009), doi:10.1002/pro.198
[77] Tyagi, M.; Bornot, A.; Offmann, B.; de Brevern, A.G., Protein short loop prediction in terms of a structural alphabet., Comput. biol. chem., 33, 329-333, (2009), doi:S1476-9271(09)00051-6 [pii] 10.1016/j.compbiolchem.2009.06.002.
[78] Tyagi, M.; Sharma, P.; Swamy, C.S.; Cadet, F.; Srinivasan, N.; de Brevern, A.G.; Offmann, B., Protein block expert (PBE): a web-based protein structure analysis server using a structural alphabet., Nucleic acids res., 34, W119-W123, (2006), doi:34/suppl_2/W119 [pii] 10.1093/nar/gkl199
[79] Unger, R.; Harel, D.; Wherland, S.; Sussman, J.L., A 3D building blocks approach to analyzing and predicting structure of proteins, Proteins, 5, 355-373, (1989)
[80] Wang, G.-Z.; Lercher, M.J, Biased amino acid composition in warm-blooded animals, Nature proc., (2009)
[81] Xu, J.; Jiao, F.; Yu, L., Protein structure prediction using threading, Methods mol. biol., 413, 91-121, (2008), doi:1-59745-574-1:91 [pii]
[82] Xue, B.; Dunbrack, R.L.; Williams, R.W.; Dunker, A.K.; Uversky, V.N., PONDR-FIT: a meta-predictor of intrinsically disordered amino acids., Biochim. biophys. acta, 1804, 996-1010, (2010), doi:S1570-9639(10)00013-0 [pii], 10.1016/j.bbapap.2010.01.011
[83] Yang, Y.D.; Park, C.; Kihara, D., Threading without optimizing weighting factors for scoring function, Proteins, 73, 581-596, (2008), doi:10.1002/prot.22082
[84] Yu, Y.K., Wootton, J.C., Altschul, S.F., 2003. The compositional adjustment of amino acid substitution matrices. Proc. Natl. Acad. Sci. U S A 100, 15688-15693, doi:10.1073/pnas.2533904100 2533904100 [pii].
[85] Zhang, Y., 2008. Progress and challenges in protein structure prediction. Curr. Opin. Struct. Biol. 18, 342-348, doi:S0959-440X(08)00034-1 [pii], 10.1016/j.sbi.2008.02.004.
[86] Zimmermann, O.; Hansmann, U.H., LOCUSTRA: accurate prediction of local protein structure using a two-layer support vector machine approach, J. chem. inf. model, 48, 1903-1908, (2008), doi:10.1021/ci800178a
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.