Several appropriate background distributions for entropy-based protein sequence conservation measures. (English) Zbl 1403.92201

Summary: Amino acid background distribution is an important factor for entropy-based methods which extract sequence conservation information from protein multiple sequence alignments (MSAs). However, MSAs are usually not large enough to allow a reliable observed background distribution. In this paper, we propose two new estimations of background distribution. One is an integration of the observed background distribution and the position-specific residue distribution, and the other is a normalized square root of observed background frequency. To validate these new background distributions, they are applied to the relative entropy model to find catalytic sites and ligand binding sites from protein MSAs. Experimental results show that they are superior to the observed background distribution in predicting functionally important residues.


92D20 Protein sequences, DNA sequences


AL2CO; H2r
Full Text: DOI


[1] Bahadur, D.; Livesay, D.R., Improving position-specific predictions of protein functional sites using phylogenetic motifs, Bioinformatics, 24, 2308-2316, (2008)
[2] Caffery, D.; Somaroo, S.; Hughes, J.; Mintserlis, J.; Hunang, E., Are protein-protein interfaces more conserved in sequence than the rest of the protein surface?, Protein sci., 13, 190-202, (2004)
[3] Cover, T.; Thomas, J., Elements of information theory, (1991), Wiley New York
[4] Capra, J.; Singh, M., Characterization and prediction of residues determining protein functional specificity, Bioinformatics, 24, 1473-1480, (2008)
[5] Capra, J.; Singh, M., Predicting functionally important residues from sequence conservation, Bioinformatics, 23, 1875-1882, (2007)
[6] Donald, J.S.; Shakhnovich, E.I., Determining functional specificity from protein sequence, Bioinformatics, 21, 2629-2635, (2005)
[7] Dukka, B.; Dennis, R., Improving position-specific predictions of protein functional sites using phylogenetic motifs, Bioinformatics, 24, 2308-2316, (2008)
[8] del Sol Mesa, A.; Pazos, F.; Valencia, A., Automatic methods for predicating functionally important residues, J. mol. biol., 326, 1289-1302, (2003)
[9] Dou, Y.C.; Zheng, X.Q.; Wang, J., Prediction of catalytic residues using the variation of stereochemical properties, Protein J., 28, 29-33, (2009)
[10] Fischer, J.D.; Mayer, C.E.; Söding, J., Prediction of protein functional residues from sequence by probability density estimation, Bioinformatics, 24, 613-620, (2008)
[11] Gribskov, M.; Robinson, N., Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching, Comput. chem., 20, 25-33, (1996)
[12] Henikoff, S.; Henikoff, J., Position-based sequence weights, J. mol. biol., 243, 574-578, (1994)
[13] Innis, C.A.; Anand, A.P.; Sowdhamini, D., Prediction of functional sites in proteins using conserved functional group analysis, J. mol. biol., 337, 1053-1068, (2004)
[14] Lin, J., Divergence measure based on the Shannon entropy, IEEE trans. inf. theor., 37, 145-151, (1991) · Zbl 0712.94004
[15] Liao, H.; Yhe, W.; Chiang, D.; Jernigan, R.; Lustig, H., Protein sequence entropy is closely related to packing density and hydrophobicity, Protein eng. des. sel., 8, 59-64, (2005)
[16] Merkl, R.; Zwick, M., H2r: identification of evolutionary important residues by means of an entropy based analysis of multiple sequence alignment, BMC bioinformatics, 9, 151, (2008)
[17] Martin, L.C.; Gloor, G.B.; Dunn, S.D.; Wahl, L.M., Using information theory to search for co-evolving residues in proteins, Bioinformatics, 21, 4116-4124, (2005)
[18] Mihalek, I.; Res˘, i.; Lichtarge, O., A family of evolution-entropy hybrid methods for ranking residues by importance, J. mol. biol., 336, 1265-1282, (2004)
[19] Mirny, L.; Shakhnovich, E., Universally conserved positions in protein folds: Reading evolutionary signals about stability, folding kinetics and function, J. mol. biol., 291, 177-196, (1999)
[20] Nielsen, m.; Chuang, I., Quantum computation and quantum information, (2000), Cambridge University Press Cambridge, UK · Zbl 1049.81015
[21] Petrova, N.; Wu, C., Prediction of catalytic residues using support vector machines with selected protein sequence and structural properties, BMC bioinformatics, 7, 312, (2006)
[22] Pei, J.; Grishin, N., AL2CO: calculation of positional conservation in a protein sequence alignment, Bioinformatics, 17, 700-712, (2001)
[23] Porter, C.; Bartlett, G.; Thornton, J., The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data, Nucleic acids res., 32, D129-D133, (2003)
[24] Panchenko, A.; Kondrashov, F.; Bryant, S., Prediction of functional sites by analysis of sequence and structure conservation, Protein sci., 13, 884-892, (2003)
[25] Reva, B.; Antipin, Y.; Sander, C., Determinants of protein function revealed by combinatorial entropy optimization, Geno. biol., 8, R232, (2007)
[26] Sterner, B.; Singh, R.; Berger, B., Predicting and annotating catalytic residues: an information theoretic approach, J. comput. biol., 14, 1058-1073, (2007)
[27] Shenkin, P.; Erman, B.L.M., Information-theoretical entropy as a measure of sequence variability, Proteins, 11, 297-313, (1991)
[28] Sander, s.; Schneider, R., Database of homology-derived protein structures and structural meaning of sequence alignment, Proteins, 9, 56-68, (1991)
[29] Smith, P.R.; Smith, T.F., Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modeling, Protein eng., 5, 35-41, (1992)
[30] Taylor, W., The classification of amino acid conservation, J. theor. biol., 119, 205-218, (1986)
[31] Tang, Y.; Sheng, Z.; Chen, Y.; Zhang, Z., An improved prediction of catalytic residues in enzyme structures, Protein eng. des. sel., 21, 295-302, (2008)
[32] Valdar, W., Scoring residue conservation, Proteins, 48, 227-241, (2002)
[33] Wang, K.; Samudrala, R., Incorporating background frequency improves entropy-based residue conservation measures, BMC bioinformatics, 7, 385, (2006)
[34] Williamson, R., Information theory analysis of the relationship between primary sequence structure and ligand recognition among a class of facilitated transporters, J. theor. biol., 174, 179-188, (1995)
[35] Ye, K.; Vriend, G.; IJzerman, A.P., Tracing evolutionary pressure, Bioinformatics, 24, 908-915, (2008)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.