Domain boundary prediction based on profile domain linker propensity index. (English) Zbl 1102.92016

Summary: Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of multi-domain proteins but also for the experimental structure determination. In this work, a novel index at the profile level is presented, namely, the profile domain linker propensity index (PDLI), which uses the evolutionary information of profiles for domain linker prediction. The frequency profiles are directly calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into binary profiles with a probability threshold. PDLI is then obtained by the frequencies of binary profiles in domain linkers as compared to those in domains. A smooth and normalized numeric profile is generated for any amino acid sequences from which the domain linkers can be predicted. Testing on the Structural Classification of Proteins (SCOP) database and CASP6 targets shows that PDLI outperforms other indexes at the amino acid level.


92C40 Biochemistry, molecular biology
92-08 Computational methods for problems pertaining to biology




Full Text: DOI


[1] Altschul, S.F.; Madden, T.L.; Schaffer, A.A.; Zhang, J.H.; Zhang, Z.; Miller, W.; Lipman, D.J., Gapped blast and psi-blast: a new generation of protein database search programs, Nucleic acids res., 25, 3389-3402, (1997)
[2] Andreeva, A.; Howorth, D.; Brenner, S.E.; Hubbard, T.J.P.; Chothia, C.; Murzin, A.G., Scop database in 2004: refinements integrate structure and sequence family data, Nucleic acids res., 32, D226-D229, (2004)
[3] Aszodi, A.; Gradwell, M.J.; Taylor, W.R., Global fold determination from a small number of distance restraints, J. mol. biol., 251, 308-326, (1995)
[4] Boeckmann, B.; Bairoch, A.; Apweiler, R.; Blatter, M.C.; Estreicher, A.; Gasteiger, E.; Martin, M.J.; Michoud, K.; O’Donovan, C.; Phan, I.; Pilbout, S.; Schneider, M., The swiss-prot protein knowledgebase and its supplement trembl in, Nucleic acids res., 31, 365-370, (2003)
[5] Busetta, B.; Barrans, Y., The prediction of protein domains, Biochim. biophys. acta, 790, 117-124, (1984)
[6] Chandonia, J.M.; Hon, G.; Walker, N.S.; Conte, L.L.; Koehl, P.; Levitt, M.; Brenner, S.E., The astral compendium in, Nucleic acids res., 32, 189-192, (2004)
[7] Corpet, F.; Servant, F.; Gouzy, J.; Kahn, D., Prodom and prodom-cg: tools for protein domain analysis and whole genome comparisons, Nucleic acids res., 28, 267-269, (2000)
[8] Dumontier, M.; Yao, R.; Feldman, H.J.; Hogue, C.W., Armadillo: domain boundary prediction by amino acid composition, J. mol. biol., 350, 1061-1073, (2005)
[9] Galzitskaya, O.V.; Melnik, B.S., Prediction of protein domain boundaries from sequence alone, Protein sci., 12, 696-701, (2003)
[10] George, R.A.; Heringa, J., An analysis of protein domain linkers: their classification and role in protein folding, Protein eng., 15, 871-879, (2002)
[11] George, R.A.; Heringa, J., Protein domain identification and improved sequence similarity searching using psi-blast, Proteins, 48, 672-681, (2002)
[12] George, R.A.; Heringa, J., Snapdragon: a method to delineate protein structural domains from sequence data, J. mol. biol., 16, 839-851, (2002)
[13] Gouzy, J.; Corpet, F.; Kahn, D., Whole genome protein domain analysis using a new method for domain clustering, Comput. chem., 23, 333-340, (1999)
[14] Henikoff, S.; Henikoff, J.G., Position-based sequence weights, J. mol. biol., 243, 574-578, (1994)
[15] Holm, L.; Sander, C., Removing near-neighbour redundancy from large protein sequence collections, Bioinformatics, 14, 423-429, (1998)
[16] Jaenicke, R., Folding and association of proteins, Prog. biophys. mol. biol., 49, 117-237, (1987)
[17] Kabsch, W.; Sander, C., Dictionary of secondary structure in proteins: pattern recognition of hydrogenbonded and geometrical features, Biopolymers, 22, 2577-2637, (1983)
[18] Kikuchi, T.; Nemethy, G.; Scheraga, H.A., Prediction of the location of structural domains in globular proteins, J. protein chem., 7, 427-471, (1988)
[19] Kyounghwa, B.; Bani, K.M.; Christine, G.E., Prediction of protein inter-domain linker regions by a hidden Markov model, Bioinformatics, 21, 2264-2270, (2005)
[20] Kyte, J.; Doolittle, R.F., A simple method for displaying the hydropathic character of a protein, J. mol. biol., 157, 105-132, (1982)
[21] Liu, J.F.; Rost, B., Chop proteins into structural domain-like fragments, Proteins struct. funct. bioinformatics, 55, 678-688, (2004)
[22] Madej, T.; Gibrat, J.F.; Bryant, S.H., Threading a database of protein cores, Proteins, 23, 356-369, (1995)
[23] Miyazaki, S.; Kuroda, Y.; Yokoyama, S., Characterization and prediction of linker sequences of multi-domain proteins by a neural network, J. struct. funct. genomics, 2, 37-51, (2002)
[24] Nagarajan, N.; Yona, G., Automatic prediction of protein domains from sequence information using a hybrid learning system, Bioinformatics, 20, 1335-1360, (2004)
[25] Nikitin, F.; Lisacek, F., Investigating protein domain combinations in complete proteomes, Comput. biol. chem., 27, 481-495, (2003)
[26] Pearl, F.; Todd, A.; Sillitoe, I.; Dibley, M.; Redfern, O.; Lewis, T.; Bennett, C.; Marsden, R.; Grant, A.; Lee, D.; Akpor, A.; Maibaum, M.; Harrison, A.; Dallman, T.; Reeves, G.; Diboun, I.; Addou, S.; Lise, S.; Johnston, C.; Sillero, A.; Thornton, J.; Orengo, C., The cath domain structure database and related resources gene3d and dhs provide comprehensive domain family information for genome analysis, Nucleic acids res., 33, D247-D251, (2005)
[27] Ponting, C.P.; Schultz, J.; Milpetz, F.; Bork, P., Smart: identification and annotation of domains from signalling and extracellular protein sequences, Nucleic acids res., 27, 229-232, (1999)
[28] Saini, H.K.; Fischer, D., Meta-dp: domain prediction meta-server, Bioinformatics, 21, 2917-2920, (2005)
[29] Sim, J.; Kim, S.Y.; Lee, J., Pprodo: prediction of protein domain boundaries using neural networks, Proteins, 59, 627-632, (2005)
[30] Suyama, M.; Ohara, O., Domcut: prediction of inter-domain linker regions in amino acid sequences, Bioinformatics, 19, 673-674, (2003)
[31] Tanaka, T.; Kuroda, Y.; Yokoyama, S., Characteristics and prediction of domain linker sequences in multidomain proteins, J. struct. funct. genomics, 4, 79-85, (2003)
[32] Udwary, D.W.; Merski, M.; Townsend, C.A., A method for prediction of linker regions within large multifunctional proteins, and its application to a type i polyketide synthase, J. mol. biol., 323, 585-598, (2002)
[33] Veretnik, S.; Bourne, P.E.; Alexandrov, N.N.; Shindyalov, I.N., Toward consistent assignment of structural domains in proteins, J. mol. biol., 339, 647-678, (2004)
[34] Wen, Z.N.; Wang, K.L.; Li, M.L.; Nie, F.S.; Yang, Y., Analyzing functional similarity of protein sequences with discrete wavelet transform, Comput. biol. chem., 29, 220-228, (2005) · Zbl 1097.92024
[35] Wheelan, S.J.; Marchler-Bauer, A.; Bryant, S.H., Domain size distributions can predict domain boundaries, Bioinformatics, 16, 613-618, (2000)
[36] Xiao, J.F.; Li, Z.S.; Sun, M.; Zhang, Y.; Sun, C.C., Homology modeling and molecular dynamics study of gsk3/shaggy-like kinase, Comput. biol. chem., 28, 179-188, (2004) · Zbl 1088.92030
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.