zbMATH — the first resource for mathematics

Cooperative “folding transition” in the sequence space facilitates function-driven evolution of protein families. (English) Zbl 1397.92501
Summary: In the protein sequence space, natural proteins form clusters of families which are characterized by their unique native folds whereas the great majority of random polypeptides are neither clustered nor foldable to unique structures. Since a given polypeptide can be either foldable or unfoldable, a kind of “folding transition” is expected at the boundary of a protein family in the sequence space. By Monte Carlo simulations of a statistical mechanical model of protein sequence alignment that coherently incorporates both short-range and long-range interactions as well as variable-length insertions to reproduce the statistics of the multiple sequence alignment of a given protein family, we demonstrate the existence of such transition between natural-like sequences and random sequences in the sequence subspaces for 15 domain families of various folds. The transition was found to be highly cooperative and two-state-like. Furthermore, enforcing or suppressing consensus residues on a few of the well-conserved sites enhanced or diminished, respectively, the natural-like pattern formation over the entire sequence. In most families, the key sites included ligand binding sites. These results suggest some selective pressure on the key residues, such as ligand binding activity, may cooperatively facilitate the emergence of a protein family during evolution. From a more practical aspect, the present results highlight an essential role of long-range effects in precisely defining protein families, which are absent in conventional sequence models.
92D15 Problems related to evolution
92D20 Protein sequences, DNA sequences
82C80 Numerical methods of time-dependent statistical mechanics (MSC2010)
92-08 Computational methods for problems pertaining to biology
Full Text: DOI
[1] Altschul, S. F.; Madden, T. L.; Schaffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. L., Gapped blast and PSI-blast: a new generation of protein database search programs, Nucleic Acids Res., 25, 3389-3402, (1997)
[2] Bacarizo, J.; Camara-Artigas, A., Atomic resolution structures of the c-src SH3 domain in complex with two high-affinity peptides from classes i and II, Acta Crystallogr. D, 69, 756-766, (2013)
[3] Balakrishnan, S.; Kamisetty, H.; Carbonell, J. G.; Lee, S. I.; Langmead, C. J., Learning generative models for protein fold families, Proteins, 79, 1061-1078, (2011)
[4] Bastolla, U.; Roman, H. E.; Vendruscolo, M., Neutral evolution of model proteins: diffusion in sequence space and overdispersion, J. Theor. Biol., 200, 49-64, (1999)
[5] Bekker, G.-J.; Nakamura, H.; Kinjo, A. R., Molmil: a molecular viewer for the PDB and beyond, J. Cheminform., 8, 42, (2016)
[6] Berg, B.; Neuhaus, T., Multicanonical ensemble: a new approach to simulate first-order phase transitions, Phys. Rev. Lett., 68, 912, (1992)
[7] Berman, H.; Henrick, K.; Nakamura, H.; Markley, J. L., The worldwide protein data bank (wwpdb): ensuring a single, uniform archive of PDB data, Nucleic Acids Res., 35, D301-D303, (2007)
[8] Biswas, T.; Tsodikov, O. V., Hexameric ring structure of the N-terminal domain of mycobacterium tuberculosis dnab helicase, FEBS J., 275, 3064-3071, (2008)
[9] Bornberg-Bauer, E., How are model protein structures distributed in sequence space?, Biophys. J., 73, 2393-2403, (1997)
[10] Bornberg-Bauer, E.; Chan, H. S., Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space, Proc. Natl. Acad. Sci. USA, 96, 10689-10694, (1999)
[11] Brough, P. A.; Barril, X.; Borgognoni, J.; Chene, P.; Davies, N. G.M.; Davis, B.; Drysdale, M. J.; Dymock, B.; Eccles, S. A.; Garcia-Echeverria, C.; Fromont, C.; Hayes, A.; Hubbard, R. E.; Jordan, A. M.; Jensen, M. R.; Massey, A.; Merrett, A.; Padfield, A.; Parsons, R.; Radimerski, T.; Raynaud, F. I.; Robertson, A.; Roughley, S. D.; Schoepfer, J.; Simmonite, H.; Sharp, S. Y.; Surgenor, A.; Valenti, M.; Walls, S.; Webb, P.; Wood, M.; Workman, P.; Wright, L. M., Combining hit identification strategies: fragment-based and in silico approaches to orally active 2-aminothieno[2,3-D]pyrimidine inhibitors of the hsp90 molecular chaperone, J. Med. Chem., 52, 4794-4809, (2009)
[12] Brumshtein, B.; Esswein, S. R.; Landau, M.; Ryan, C. M.; Whitelegge, J. P.; Phillips, M. L.; Cascio, D.; Sawaya, M. R.; Eisenberg, D. S., Formation of amyloid fibers by monomeric light-chain variable domains, J. Biol. Chem., 289, 27513-27525, (2014)
[13] Bryngelson, J. D.; Onuchic, J. N.; Socci, N. D.; Wolynes, P. G., Funnels, pathways, and the energy landscape of protein folding: a synthesis, Proteins Struct. Funct. Bioinf., 21, 167-195, (1995)
[14] Cheng, H.; Schaeffer, R. D.; Liao, Y.; Kinch, L. N.; Pei, J.; Shi, S.; Kim, B.-H.; Grishin, N. V., Ecod: an evolutionary classification of protein domains, PLoS Comput. Biol., 10, 12, 1-18, (2014)
[15] Clementi, C.; Nymeyer, H.; Onuchic, J., Topological and energetic factors: what determines the structural details of the transition state ensemble and “en-route” intermediates for protein folding? an investigation for small globular proteins, J. Mol. Biol., 298, 937-953, (2000)
[16] Cocco, S.; Feinauer, C.; Figliuzzi, M.; Monasson, R.; Weigt, M., Inverse statistical physics of protein sequences: a key issues review, Rep. Prog. Phys, 81, 032601, (2018)
[17] Dokholyan, N. V.; Shakhnovich, E. I., Understanding hierarchical protein evolution from first principles, J. Mol. Biol., 312, 289-307, (2001)
[18] Dunker, A.; Lawson, J.; Brown, C. J.; Williams, R. M.; Romero, P.; Oh, J. S.; Oldfield, C. J.; Campen, A. M.; Ratliff, C. M.; Hipps, K. W.; Ausio, J.; Nissen, M. S.; Reeves, R.; Kang, C.; Kissinger, C. R.; Bailey, R. W.; Griswold, M. D.; Chiu, W.; Garner, E. C.; Obradovic, Z., Intrinsically disordered protein, J. Mol. Graphics Modell., 19, 26-59, (2001)
[19] Eddy, S. R., Accelerated profile HMM searches, PLoS Comput. Biol., 7, e1002195, (2011)
[20] Ekeberg, M.; Lövkvist, C.; Lan, Y.; Weigt, M.; Aurell, E., Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models, Phys. Rev. E, 87, 012707, (2013)
[21] Finn, R. D.; Bateman, A.; Clements, J.; Coggill, P.; Eberhardt, R. Y.; Eddy, S. R.; Heger, A.; Hetherington, K.; Holm, L.; Mistry, J.; Sonnhammer, E. L.L.; Tate, J.; Punta, M., The pfam protein families database, Nucleic Acids Res., 42, D222-D230, (2014)
[22] Gō, N., Theoretical studies of protein folding, Annu. Rev. Biophys. Bioeng., 12, 183-210, (1983)
[23] Goldstein, R. A., The structure of protein evolution and the evolution of protein structure, Curr. Opin. Struct. Biol., 18, 170-177, (2008)
[24] Govindarajan, S.; Goldstein, R. A., Searching for foldable protein structures using optimized energy functions, Biopolymers, 36, 43-51, (1995)
[25] Govindarajan, S.; Goldstein, R. A., Why are some protein structures so common?, Proc. Natl. Acad. Sci. USA, 93, 3341-3345, (1996)
[26] Gribskov, M.; McLachlan, A. D.; Eisenberg, D., Profile analysis: detection of distantly related proteins, Proc. Natl. Acad. Sci. U.S.A., 84, 4355-4358, (1987)
[27] Holzgräfe, C.; Wallin, S., Smooth functional transition along a mutational pathway with an abrupt protein fold switch, Biophys. J., 107, 1217-1225, (2014)
[28] Itsathitphaisarn, O.; Wing, R. A.; Eliason, W. K.; Wang, J.; Steitz, T. A., The hexameric helicase dnab adopts a nonplanar conformation during translocation, Cell, 151, 267-277, (2012)
[29] JCSG, STEMCELL, 2014. Crystal structure of a distal-less homeobox protein 5 (Dlx5) from Homo sapiens at 1.85 åresolution. PDB: 4RDU.
[30] Jones, D. T.; Buchan, D. W.; Cozzetto, D.; Pontil, M., PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments, Bioinformatics, 28, 184-190, (2012)
[31] Karthikeyan, S.; Leung, T.; Birrane, G.; Webster, G.; Ladias, J. A., Crystal structure of the PDZ1 domain of human na(+)/h(+) exchanger regulatory factor provides insights into the mechanism of carboxyl-terminal leucine recognition by class i PDZ domains, J. Mol. Biol., 308, 963-973, (2001)
[32] Kindermann, R.; Snell, J. L., Markov random fields and their applications, (1980), American Mathematical Society, Providence, Rhode Island, USA · Zbl 1229.60003
[33] Kinjo, A. R., Liquid-theory analogy of direct-coupling analysis of multiple-sequence alignment and its implications for protein structure prediction, Biophys. Physicobiol., 12, 117-119, (2015)
[34] Kinjo, A. R., A unified statistical model of protein multiple sequence alignment integrating direct coupling and insertions, Biophys. Physicobiol, 13, 45-62, (2016)
[35] Kinjo, A. R., Monte Carlo simulation of a statistical mechanical model of multiple protein sequence alignment, Biophys. Physicobiol., 14, 99-110, (2017)
[36] Kinjo, A. R.; Bekker, G.-J.; Suzuki, H.; Tsuchiya, Y.; Kawabata, T.; Ikegawa, Y.; Nakamura, H., Protein data bank Japan (pdbj): updated user interfaces, resource description framework, analysis tools for large structures, Nucleic Acids Res., 45, D282-D288, (2017)
[37] Kinjo, A. R.; Bekker, G.-J.; Wako, H.; Endo, S.; Tsuchiya, Y.; Sato, H.; Nishi, H.; Kinoshita, K.; Suzuki, H.; Kawabata, T.; Yokochi, M.; Iwata, T.; Kobayashi, N.; Fujiwara, T.; Kurisu, G.; Nakamura, H., New tools and functions in data-out activities at protein data bank Japan (pdbj), Protein Sci., 27, 95-102, (2018)
[38] Kinjo, A. R.; Horimoto, K.; Nishikawa, K., Predicting absolute contact numbers of native protein structure from amino acid sequence, Proteins, 58, 158-165, (2005)
[39] Kinjo, A. R.; Nakamura, H., Nature of protein family signatures: insights from singular value analysis of position-specific scoring matrices, PLoS One, 3, E1963, (2008)
[40] Kinjo, A. R.; Nakamura, H., Comprehensive structural classification of ligand binding motifs in proteins, Structure, 17, 234-246, (2009)
[41] Kinjo, A. R.; Nakamura, H., Geometric similarities of protein-protein interfaces at atomic resolution are only observed within homologous families: an exhaustive structural classification study, J. Mol. Biol., 399, 526-540, (2010)
[42] Kinjo, A. R.; Suzuki, H.; Yamashita, R.; Ikegawa, Y.; Kudo, T.; Igarashi, R.; Kengaku, Y.; Cho, H.; Standley, D. M.; Nakagawa, A.; Nakamura, H., Protein data bank Japan (pdbj): maintaining a structural data archive and resource description framework format, Nucleic Acids Res., 40, D453-D460, (2012)
[43] Koehl, P.; Levitt, M., Protein topology and stability define the space of allowed sequences, Proc. Natl. Acad. Sci. USA, 99, 1280-1285, (2002)
[44] Kuhlman, B.; Baker, D., Native protein sequences are close to optimal for their structures, Proc. Natl. Acad. Sci. USA, 97, 10383-10388, (2000)
[45] Landau, L. D.; Lifshitz, E. M., Statistical physics, part i, 3rd edition, (1980), Iwanami Shoten Tokyo, Japanese translation
[46] Lange, J.; Wyrwicz, L. S.; Vriend, G., Kmad: knowledge-based multiple sequence alignment for intrinsically disordered proteins, Bioinformatics, 32, 932-936, (2015)
[47] Lapedes, A. S.; Giraud, B.; Liu, L.; Stormo, G. D., Correlated mutations in models of protein sequences: phylogenetic and structural effects, Statistics in Molecular Biology and Genetics, IMS Lecture Notes-Monograph Series, 33, 236-256, (1999)
[48] Levy, R. M.; Haldane, A.; Flynn, W. F., Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness, Curr. Opin. Struct. Biol., 43, 55-62, (2017)
[49] Li, H.; Helling, R.; Tang, C.; Wingreen, N., Emergence of preferred structures in a simple model of protein folding, Science, 273, 666-669, (1996)
[50] Li, S.; Li, T.; Xu, Y.; Zhang, Q.; Zhang, W.; Che, S.; Liu, R.; Wang, Y.; Bartlam, M., Structural insights into yfir sequestering by yfib in pseudomonas aeruginosa PAO1, Sci. Rep., 5, 16915, (2015)
[51] Liu, Y.; Olanrewaju, Y. O.; Zhang, X.; Cheng, X., DNA recognition of 5-carboxylcytosine by a zfp57 mutant at an atomic resolution of 0.97 å, Biochemistry, 52, 9310-9317, (2013)
[52] Lo, Y. H.; Tsai, K. L.; Sun, Y. J.; Chen, W. T.; Huang, C. Y.; Hsiao, C. D., The crystal structure of a replicative hexameric helicase dnac and its complex with single-stranded DNA, Nucleic Acids Res., 37, 804-814, (2009)
[53] Lockless, S. W.; Ranganathan, R., Evolutionarily conserved pathways of energetic connectivity in protein families, Science, 286, 295-299, (1999)
[54] Ma, J.; Wang, S.; Wang, Z.; Xu, J., Mrfalign: protein homology detection through alignment of Markov random fields, PLoS Comput. Biol., 10, 1-12, (2014)
[55] MacKay, D. J.C., Information theory, inference, and learning algorithms, (2003), Cambridge University Press UK · Zbl 1055.94001
[56] Manjasetty, B. A.; Niesen, F. H.; Scheich, C.; Roske, Y.; Gotz, F.; Behlke, J.; Sievert, V.; Heinemann, U.; Bussow, K., X-ray structure of engineered human aortic preferentially expressed protein-1 (APEG-1), BMC Struct.Biol., 5, 21, (2005)
[57] Marina, A.; Waldburger, C. D.; Hendrickson, W. A., Structure of the entire cytoplasmic portion of a sensor histidine-kinase protein, EMBO J., 24, 4247-4259, (2005)
[58] Maynard Smith, J., Natural selection and the concept of a protein space, Nature, 225, 563-564, (1970)
[59] McLellan, J. S.; Yao, S.; Zheng, X.; Geisbrecht, B. V.; Ghirlando, R.; Beachy, P. A.; Leahy, D. J., Structure of a heparin-dependent complex of hedgehog and ihog, Proc. Natl. Acad. Sci. USA, 103, 17208-17213, (2006)
[60] Minezaki, Y.; Homma, K.; Kinjo, A. R.; Nishikawa, K., Human transcription factors contain a high fraction of intrinsically disordered regions essential for transcriptional regulation, J. Mol. Biol., 359, 1137-1149, (2006)
[61] Miyazawa, S., Prediction of contact residue pairs based on co-substitution between sites in protein structures, PLoS One, 8, E54252, (2013)
[62] Miyazawa, S., Selection originating from protein stability/foldability: relationships between protein folding free energy, sequence ensemble, and fitness, J. Theor. Biol., 433, 21-38, (2017) · Zbl 1393.92041
[63] Miyazawa, S.; Jernigan, R. L., Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation, Macromolecules, 18, 534-552, (1985)
[64] Mondragon, A.; Subbiah, S.; Almo, S. C.; Drottar, M.; Harrison, S. C., Structure of the amino-terminal domain of phage 434 repressor at 2.0 åresolution,, J. Mol. Biol., 205, 189-200, (1989)
[65] Morcos, F.; Pagnani, A.; Lunt, B.; Bertolino, A.; Marks, D. S.; Sander, C.; Zecchina, R.; Onuchic, J. N.; Hwa, T.; Weigt, M., Direct-coupling analysis of residue coevolution captures native contacts across many protein families, Proc. Natl. Acad. Sci. USA, 108, E1293-E1301, (2011)
[66] Morcos, F.; Schafer, N. P.; Cheng, R. R.; Onuchic, J. N.; Wolynes, P. G., Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection, Proc. Natl. Acad. Sci. USA, 111, 12408-12413, (2014)
[67] Müller, G. B., Evo-devo: extending the evolutionary synthesis, Nat. Rev. Genet., 8, 943-949, (2007)
[68] Murzin, A. G.; Brenner, S. E.; Hubbard, T.; Chothia, C., SCOP: A structural classification of proteins database for the investigation of sequences and structures, J. Mol. Biol., 247, 536-540, (1995)
[69] Myers, J. C.; Moore, S. A.; Shamoo, Y., Structure-based incorporation of 6-methyl-8-(2-deoxy-beta-ribofuranosyl)isoxanthopteridine into the human telomeric repeat DNA as a probe for UP1 binding and destabilization of G-tetrad structures, J. Biol.Chem., 278, 42300-42306, (2003)
[70] Newman, M. E.J.; Barkema, G. T.; Oxford, U., Monte Carlo methods in statistical physics, (1999), Clarendon Press · Zbl 1012.82019
[71] Nishikawa, K., Island hypothesis: protein distribution in the sequence space, Viva Origino, 21, 91-102, (1993)
[72] Nishikawa, K., Information concept in biology, Bioinformatics, 18, 649-651, (2002)
[73] Nishikawa, K.; Kinjo, A. R., Cooperation between phenotypic plasticity and genetic mutations can account for the cumulative selection in evolution, Biophysics, 10, 99-108, (2014)
[74] Orengo, C. A.; Michie, A. D.; Jones, S.; Jones, D. T.; Swindells, M. B.; Thornton, J. M., CATH - A hierarchic classification of protein domain structures, Structure, 5, 1093-1108, (1997)
[75] Ota, M.; Kinoshita, K.; Nishikawa, K., Prediction of catalytic residues in enzymes based on known tertiary structure, stability profile, and sequence conservation, J. Mol. Biol., 327, 1053-1064, (2003)
[76] Pande, V. S.; Grosberg, A. Y.; Tanaka, T., Folding thermodynamics and kinetics of imprinted renaturable heteropolymers, J. Chem. Phys., 101, 8246-8257, (1994)
[77] Pernigo, S.; Fukuzawa, A.; Bertz, M.; Holt, M.; Rief, M.; Steiner, R.; Gautel, M., Structural insight into m-band assembly and mechanics from the titin-obscurin-like-1 complex, Proc. Natl. Acad. Sci. USA, 107, 2908-2913, (2010)
[78] Raman, S.; Singh, M.; Tatu, U.; Suguna, K., First structural view of a peptide interacting with the nucleotide binding domain of heat shock protein 90, Sci. Rep., 5, 17015, (2015)
[79] Rodgers, D. W.; Harrison, S. C., The complex between phage 434 repressor DNA-binding domain and operator site OR3: structural differences between consensus and non-consensus half-sites, Structure, 1, 227-240, (1993)
[80] Saksela, K.; Permi, P., SH3 domain ligand binding: what’s the consensus and where’s the specificity?, FEBS Lett., 586, 2609-2614, (2012)
[81] Schreiter, E.; Rodriguez, M.; Weichsel, A.; Montfort, W.; Bonaventura, J., S-nitrosylation-induced conformational change in blackfin tuna myoglobin, J. Biol. Chem., 282, 19773-19780, (2007)
[82] Shakhnovich, E. I.; Gutin, A. M., Engineering of stable and fast-folding sequences of model proteins, Proc. Natl. Acad. Sci. USA, 90, 7195-7199, (1993)
[83] Shakhnovich, E. I.; Gutin, A. M., A new approach to the design of stable proteins, Protein Eng., 6, 793-800, (1993)
[84] Sikosek, T.; Krobath, H.; Chan, H. S., Theoretical insights into the biophysics of protein bi-stability and evolutionary switches, PLoS Comput. Biol., 12, e1004960, (2016)
[85] Socolich, M.; Lockless, S. W.; Lee, H. L.; Gardner, K.; Ranganathan, R., Evolutionary information for specifying a protein fold, Nature, 437, 512-518, (2005)
[86] Sutto, L.; Marsili, S.; Valencia, A.; Gervasio, F. L., From residue coevolution to protein conformational ensembles and functional dynamics, Proc. Natl. Acad. Sci. USA, 112, 44, 13567-13572, (2015)
[87] Taylor, W. R., Identification of protein sequence homology by consensus template alignment, J. Mol. Biol., 188, 233-258, (1986)
[88] Taylor, W. R.; Jones, D. T.; Sadowski, M. I., Protein topology from predicted residue contacts, Prot. Sci., 21, 299-305, (2012)
[89] Tochio, H.; Zhang, Q.; Mandal, P.; Li, M.; Zhang, M., Solution structure of the extended neuronal nitric oxide synthase PDZ domain complexed with an associated peptide, Nat. Struct. Biol., 6, 417-421, (1999)
[90] Tompa, P., Intrinsically disordered proteins: a 10-year recap, Trends Biochem. Sci., 37, 509-516, (2012)
[91] Triant, D. A.; Pearson, W. R., Most partial domains in proteins are alignment and annotation artifacts, Genome Biol., 16, 99, (2015)
[92] Velankar, S.; Dana, J. M.; Jacobsen, J.; van Ginkel, G.; Gane, P. J.; Luo, J.; Oldfield, T. J.; O’Donovan, C.; Martin, M. J.; Kleywegt, G. J., SIFTS: structure integration with function, taxonomy and sequences resource, Nucleic Acids Res., 41, D483-D489, (2013)
[93] Vitali, J.; Ding, J.; Jiang, J.; Zhang, Y.; Krainer, A. R.; Xu, R. M., Correlated alternative side chain conformations in the RNA-recognition motif of heterogeneous nuclear ribonucleoprotein A1, Nucleic Acids Res., 30, 1531-1538, (2002)
[94] Volz, K.; Matsumura, P., Crystal structure of Escherichia colichey refined at 1.7-å resolution, J. Biol. Chem., 266, 15511-15519, (1991)
[95] Wang, F.; Landau, D. P., Efficient, multiple-range random walk algorithm to calculate the density of states, Phys. Rev. Lett., 86, 2050-2053, (2001)
[96] Wroe, R.; Bornberg-Bauer, E.; Chan, H. S., Comparing folding codes in simple heteropolymer models of protein evolutionary landscape: robustness of the superfunnel paradigm, Biophys. J., 88, 118-131, (2005)
[97] Wroe, R.; Chan, H. S.; Bornberg-Bauer, E., A structural model of latent evolutionary potentials underlying neutral networks in proteins, HFSP J., 1, 79-87, (2007)
[98] Xu, M.; Yang, X.; Yang, X.-A.; Zhou, L.; Liu, T.-Z.; Fan, Z.; Jiang, T., Structural insights into the regulatory mechanism of the pseudomonas aeruginosa yfibnr system, Protein Cell, 7, 403-416, (2016)
[99] Yomo, T.; Saito, S.; Sasai, M., Gradual development of protein-like global structures through functional selection, Nat. Struct. Biol., 6, 743-746, (1999)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.