×

Probing the protein space for extending the detection of weak homology folds. (English) Zbl 1406.92456

Summary: Redundancy of prediction methods has been used to explore the occurrence of weak homology protein motifs. A hybrid template-based algorithm has been implemented to predict different layers of protein structure by detecting domain building sub-structures, which share low sequence identity. Physicochemical determinants, secondary structure profiles, and multiple alignments have been analyzed to generate a broad set of structural sub-domains. Then, intensive computing procedures generated all the various tridimensional folds on the basis of secondary structure predictions, fragment assembly and detection of structural homologs. The proposed algorithm not only identifies common protein sub-structures, but also detects higher order architectures such as domain superfamilies/superfolds by linking backbone trajectories of supersecondary structures. Applying rigid transformation protocols, population of the detected domain building models with an average root mean square deviation from native structures of 2.3Å and an average template modeling score from native structures of 0.43 has been obtained. The fold detection algorithm here proposed yields more accurate results than previously proposed methods, predicting structural homology also for proteins sharing less than 20% sequence identity. Our tools are freely available at http://www.acbrc.org/tools.html.

MSC:

92D20 Protein sequences, DNA sequences
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J., Basic local alignment search tool, J. Mol. Biol., 215, 403-410 (1990)
[2] Altschul, S. F.; Madden, T. L.; Schaffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, O. J.; BLAST, Gapped, and PSI-BLAST: A new generation of protein database search programs, Nucleic Acid Res., 25, 3389-3402 (1997)
[3] Ben-Naim, A., Levinthal’s question revisited, and answered, J. Biomol. Struct. Dyn., 30, 113-124 (2012)
[4] Bernardes, J. S.; Carbone, A.; Zaverucha, G., A discriminative method for family-based proteinremote homology detection that combines inductive logic programming and propositional models, BMC Bioinformatics, 12, 83-95 (2011)
[5] Bujnicki, J. M., Protein-structure prediction by recombination of fragments, ChemBioChem, 7, 19-27 (2006)
[6] Berezovsky, I. N.; Trifonov, E. N., Protein structure and folding: a new start, J. Biomol. Struct. Dyn., 19, 397-403 (2001)
[7] C. Levinthal, How to fold graciously. In: Mossbauer Spectroscopy in Biological Systems Proceedings; vol. 67, 1969. pp. 22-24.; C. Levinthal, How to fold graciously. In: Mossbauer Spectroscopy in Biological Systems Proceedings; vol. 67, 1969. pp. 22-24.
[8] Chothia, C.; Lesk, A. M., The relation between the divergence of sequence and structure in proteins, EMBO J., 5, 823-826 (1986)
[9] Chothia, C., Proteins. One thousand families for the molecular biologist, Nature, 357, 543-544 (1992)
[10] Daga, P. R.; Patel, R. Y.; Doerksen, R. J., Template-based protein modeling: recent methodological advances, Curr. Top. Med. Chem., 10, 84-94 (2010)
[11] Du, P.; Andrec, M.; Levy, R. M., Have we seen all structures corresponding to short protein fragments in the Protein Data Bank? An update, Protein Eng., 16, 407-414 (2003)
[12] Dinkel, H.; Michael, S.; Weatheritt, R. J.; Davey, N. E.; Van Roey, K.; Altenberg, B.; Toedt, G.; Uyar, B.; Seiler, M.; Budd, A.; Jodicke, L.; Dammert, M. A.; Schroeter, C.; Hammer, M.; Schmidt, T.; Jehl, P.; McGuigan, C.; Dymecka, M.; Chica, C.; Luck, K.; Via, A.; Chatr-aryamontri, A.; Haslam, N.; Grebnev, G.; Edwards, R. J.; Steinmetz, M. O.; Meiselbach, H.; Diella, F.; Gibson, T. J., ELM—the database of eukaryotic linear motifs, Nucleic Acid Res., 40, 242-251 (2011)
[13] Ginalski, K.; Pas, J.; Wyrwicz, L. S.; von Grotthuss, M.; Bujnicki, J. M.; Rychlewski, L., ORFeus: detection of distant homology using sequence profiles and predicted secondary structure, Nucleic Acid Res., 31, 3804-3807 (2003)
[14] Guo, J. T.; Ellrott, J. T.K.; Xu, Y., A historical perspective of template-based protein structure prediction, Methods Mol. Biol., 413, 3-42 (2008)
[15] Harrison, A.; Pearl, F.; Mott, R.; Thornton, J.; Orengo, C. A., Quantifying the similarities within fold space, J. Mol. Biol., 323, 909-926 (2002)
[16] Hughey, R.; Krogh, A., Hidden Markov models for sequence analysis: extension and analysis of the basic method, Comput. Appl. Biosci., 12, 95-107 (1996)
[17] Hamming, R. W., Error detecting and error correcting codes, Bell Syst. Tech. J., 26, 147-160 (1950) · Zbl 1402.94084
[18] Jones, A. T.; Thirup, S., Using known substructures in protein model building and crystallography, EMBO J., 5, 819-822 (1986)
[19] Jaroszewski, L.; Rychlewski, L.; Li, Z.; Li, W.; Godzik, A., FFAS03: a server for profile-profile sequence alignments, Nucleic Acid Res., 33, 284-288 (2005)
[20] Karplus, K.; Barrett, C.; Hughey, R., Hidden Markov models for detecting remote protein homologies, Bioinformatics, 14, 846-856 (1998)
[21] Kolodny, R.; Koehl, P.; Guibas, L.; Levitt, M., Small libraries of protein fragments model native protein structures accurately, J. Mol. Biol., 323, 297-307 (2002)
[22] Koczyk, G.; Berezovsky, I. N., Domain Hierarchy and closed loops (DHcL): a server for exploring hierarchy of protein domain structure, Nucleic Acid Res., 36, 239-245 (2008)
[23] Kyte, J.; Doolittle, R., A simple method for displaying the hydropathic character of a protein, J. Mol. Biol., 157, 105-132 (1982)
[24] Kelley, L. A.; Sternberg, M. J., Protein structure prediction on the Web: a case study using the Phyre server, Nat. Protoc., 4, 363-371 (2009)
[25] Liu, T.; Tang, G. W.; Capriotti, E., Comparative modeling: the state of the art and protein drug target structure prediction, Comb. Chem. High Throughput Screen., 14, 532-547 (2011)
[26] Lupas, A. N.; Ponting, C. P.; Russell, R. B., On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world?, J. Struct. Biol., 134, 191-203 (2001)
[27] Lolkema, J. S.; Slotboom, D., Hydropathy profile alignment: a tool to search for structural homologues of membrane proteins, FEMS Microb. Rev., 22, 305-322 (1998)
[28] Murzin, A. G.; Brenner, S. E.; Hubbard, T.; Chothia, C., SCOP: a structural classification of proteins database for the investigation of sequences and structures, J. Mol. Biol., 247, 536-540 (1995)
[29] Madera, M., Profile comparer: a program for scoring and aligning profile hidden Markov models, Bioinformatics, 24, 2630-2631 (2008)
[30] Moult, J.; Fidelis, K.; Kryshtafovych, A.; Tramontano, A., Critical assessment of methods of protein structure prediction (CASP)—round IX, Proteins, 79, 1-5 (2011)
[31] Marchler-Bauer, A.; Lu, S.; Anderson, J. B.; Chitsaz, F.; Derbyshire, M. K.; DeWeese-Scott, C.; Fong, J. H.; Geer, L. Y.; Geer, R. C.; Gonzales, N. R.; Gwadz, M.; Hurwitz, D. I.; Jackson, J. D.; Ke, Z.; Lanczycki, C. J.; Lu, F.; Marchler, G. H.; Mullokandov, M.; Omelchenko, M. V.; Robertson, C. L.; Song, J. S.; Thanki, N.; Yamashita, R. A.; Zhang, D.; Zhang, N.; Zheng, C.; Bryant, S. H., CDD: a conserved domain database for the functional annotation of proteins, Nucleic Acid Res., 39, 225-229 (2011)
[32] Needleman, S. B.; Wunsch, C. D., A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol., 48, 443-453 (1970)
[33] Orengo, C. A.; Michie, A. D.; Jones, S.; Jones, D. T.; Swindells, M. B.; Thornton, J. M., CATH - a hierarchic classification of protein domain structures, Structure, 5, 1093-1108 (1997)
[34] Orengo, C. A.; Jones, D. T.; Thornton, J. M., Protein superfamilies and domain superfolds, Nature, 372, 631-634 (1994)
[35] Pearson, W. R., Rapid and sensitive sequence comparison with FASTP and FASTA, Methods Enzymol., 183, 63-98 (1990)
[36] Poupon, A.; Mornon, J. P., Predicting the protein folding nucleus from a sequence, FEBS Lett., 452, 283-289 (1999)
[37] Reid, A. J.; Yeats, C.; Orengo, C. A., Methods of remote homology detection can be combined to increase coverage by 10
[38] Roy, S.; Ratnaswamy, G.; Boice, J. A.; Fairman, R.; McLendon, G.; Hecht, M. H., A protein designed by binary patterning of polar and nonpolar amino acids displays native-like properties, J. Am. Chem. Soc., 119, 5302-5306 (1997)
[39] Smith, T. F.; Waterman, M. S., Identification of common molecular subsequences, J. Mol. Biol., 147, 195-197 (1981)
[40] Skolnick, J.; Kihara, D.; Zhang, Y., Development and large scale benchmark testing of the PROSPECTOR 3.0 threading algorithm, Protein, 56, 502-518 (2004)
[41] Soding, J., Protein homology detection by HMM-HMM comparison, Bioinformatics, 21, 951-960 (2005)
[42] Selvaraj, S.; Gromiha, M. M., Role of Hydrophobic clusters and long-range contact networks in the folding of (α/β)\(_8\) barrel proteins, Biophys. J., 84, 1919-1925 (2003)
[43] Schwartz, R.; Istrail, S.; King, J., Frequencies of amino acid strings in globular protein sequences indicate suppression of blocks of consecutive hydrophobic residues, Protein Sci., 10, 1023-1031 (2001)
[44] Sadreyev, R. I.; Tang, M.; Kim, B.; Grishin, N. V., COMPASS server for homology detection: improved statistical accuracy, speed and functionality, Nucleic Acid Res., 37, 90-94 (2009)
[45] Todd, A. E.; Orengo, C. A.; Thornton, J. M., Evolution of function in protein superfamilies, from a structural perspective, J. Mol. Biol., 307, 1113-1143 (2001)
[46] Traut, T. W., Do exons code for structural or functional units in proteins?, Proc. Natl. Acad. Sci. USA, 85, 2944-2948 (1988)
[47] Thompson, J. D.; Higgins, D. G.; Gibson, T. J., CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acid Res., 22, 4673-4680 (1994)
[48] Vapnik, V. N., An overview of statistical learning theory, IEEE Trans. Neural Networks, 10, 988-999 (1999)
[49] Wiltgen, M., Structural bioinformatics: from the sequence to structure and function, Curr. Bioinformatics, 4, 54-87 (2009)
[50] Zhang, Y., Protein structure prediction: when is it useful?, Curr. Opin. Struct. Biol., 19, 145-155 (2009)
[51] Zhang, Y., Progress and challenges in protein structure prediction, Curr. Opin. Struct. Biol., 18, 342-348 (2008)
[52] Zhang, Y.; Skolnick, J., TM-align: a protein structure alignment algorithm based on TM-score, Nucleic Acid Res., 33, 2302-2309 (2005)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.