Analyzing functional similarity of protein sequences with discrete wavelet transform. (English) Zbl 1097.92024

Summary: This paper applies discrete wavelet transforms (DWT) with various protein substitution models to find functional similarities of proteins with low identity. A new metric, ‘\(S\)’ function, based on the DWT is proposed to measure pair-wise similarity. We also develop a segmentation technique, combined with DWT, to handle long protein sequences. The results are compared with those using the pair-wise alignment and PSI-BLAST.


92C40 Biochemistry, molecular biology
65T60 Numerical methods for wavelets
Full Text: DOI


[1] Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J., Basic local alignment search tool, J. mol. biol., 215, 403-410, (1990)
[2] Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic acids res., 25, 3389-3402, (1997)
[3] Bahr, A.; Thompson, J.D.; Thierry, J.C.; Poch, O., Balibase (benchmark alignment database): enhancements for repeats, transmembrane sequences and circular permutations, Nucleic acids res., 29, 323-326, (2001)
[4] Bateman, A.; Coin, L.; Durbin, R.; Finn, R.D.; Hollich, V.; Griffiths-Jones, S.; Khanna, A.; Marshall, M.; Moxon, S.; Sonnhammer, E.L.L.; Studholme, D.J.; Yeats, C.; Eddy, S.R., The pfam protein families database, Nucleic acids res., 32, D138-D141, (2004)
[5] Bentley, P.M.; McDonnell, J.T.E., Wavelet transforms: an introduction, IEE electron. commun. eng. J., 6, 175-186, (1994)
[6] Boeckmann, B.; Bairoch, A.; Apweiler, R.; Blatter, M.C.; Estreicher, A.; Gasteiger, E.; Martin, M.J.; Michoud, K.; O’Donovan, C.; Phan, I.; Pilbout, S.; Schneider, M., The SWISS-PROT protein knowledgebase and its supplement trembl in 2003, Nucleic acids res., 31, 365-370, (2003)
[7] Chenna, R.; Sugawara, H.; Koike, T.; Lopez, R.; Gibson, T.J.; Higgins, D.G.; Thompson, J.D., Multiple sequence alignment with the clustal series of programs, Nucleic acids res., 31, 3497-3500, (2003)
[8] Cohen, A.; Daubechies, I.; Feauveau, J.C., Bi-orthogonal bases of compactly supported wavelets, Commun. pure appl. math., 45, 485-560, (1992) · Zbl 0776.42020
[9] Cosic, I., Macromolecular bioactivity: is it resonant interaction between macromolecules?—theory and applications, IEEE trans. biomed. eng., 41, 1101-1114, (1994)
[10] Daubechies, I., Ten lectures on wavelets. 61, CBMS-NSF regional conference series in applied mathematics, (1992), SIAM PA
[11] Dayhoff, M.O.; Schwartz, R.M.; Orcutt, B.C., (), 345-352
[12] Doolittle, R.F., Similar amino acid sequences: chance or common ancestry?, Science, 214, 149-159, (1981)
[13] Dror, O.; Benyamini, H.; Nussinov, R.; Wolfson, H., MASS: multiple structural alignment by secondary structures, Bioinformatics, 19, Suppl. 1, i95-i104, (2003)
[14] Eddy, S.R., Multiple alignment using hidden Markov models, Ismb, 3, 114-120, (1995)
[15] Eymery, F.; Rey, P., Immunocytolocalization of CDSP 32 and CDSP 34, two chloroplastic drought-induced stress proteins in solanum tuberosum plants, Plant physiol. biochem., 37, 305-312, (1999)
[16] Gillet, B.; Beyly, A.; Peltier, G.; Rey, P., Molecular characterization of CDSP 34, a chloroplastic protein induced by water deficit in solanum tuberosum L. plants, and regulation of CDSP 34 expression by ABA and high illumination, Plant J., 16, 257-262, (1998)
[17] Grantham, R., Amino acid difference formular to help explain protein evolution, Science, 185, 862-864, (1974)
[18] Harte, N.; Silventoinen, V.; Quevillon, E.; Robinson, S.; Kallio, K.; Fustero, X.; Patel, P.; Jokinen, P.; Lopez, R., Public web-based services from the European bioinformatics institute, Nucleic acids res., 32, W3-W9, (2004)
[19] Hejase de Trad, C.; Fang, Q.; Cosic, I., The resonant recognition model (RRM) predicts amino acid residues in highly conserved regions of the hormone prolactin (PRL), Biophys. chem., 84, 149-157, (2000)
[20] Hejase de Trad, C.; Fang, Q.; Cosic, I., Protein sequence comparison based on the wavelet transform approach, Protein eng., 15, 193-203, (2002)
[21] Henikoff, S.; Henikoff, J.G., Amino acid substitution matrices from protein blocks, Proc. natl. acad. sci. U.S.A., 89, 10915-10919, (1992)
[22] Hirakawa, H.; Muta, S.; Kuhara, S., The hydrophobic cores of proteins predicted by wavelet analysis, Bioinformatics, 4, 141-148, (1999)
[23] Hulo, N.; Sigrist, C.J.A.; Le Saux, V.; Langendijk-Genevaux, P.S.; Bordoli, L.; Gattiker, A.; De Castro, E.; Bucher, P.; Bairoch, A., Recent improvements to the PROSITE database, Nucleic acids res., 32, D134-D137, (2004)
[24] Karplus, K.; Barrett, C.; Hughey, R., Hidden Markov models for detecting remote protein homologies, Bioinformatics, 14, 846-856, (1998)
[25] Krishnan, A.; Li, K.B.; Issac, P., Rapid detection of conserved regions in protein sequences using wavelets, In silico biol., 4, 0013, (2004)
[26] Leung, A.K.; Chau, F.T.; Gao, J., A review on applications of wavelet transform techniques in chemical analysis: 1989-1997, Chemom. intell. lab. syst., 43, 165-184, (1998)
[27] Li, K.B.; Issac, P.; Krishnan, A., Predicting allergenic proteins using wavelet transform, Bioinformatics, 20, 2572-2578, (2004)
[28] Li, M.L.; Kang, B.; Qi, H.Y.; Wen, Z.N., Compressibility evaluation of IR spectra wavelet compression, Chem. J. China univ., 23, 128-1284, (2002)
[29] Li, M.L.; Qi, H.Y.; Nie, F.S.; Wen, Z.N.; Kang, B., Application of embedded zerotree wavelet to the compression of infrared spectra, Chin. chem. lett., 14, 1193-1195, (2003)
[30] Lin, K.; May, A.C.W.; Taylor, W.R., Amino acid encoding schemes from protein structure alignments: multi-dimensional vectors to describe residue types, J. theor. biol., 216, 361-365, (2002)
[31] Liò, P., Wavelets in bioinformatics and computational biology: state of art and perspectives, Bioinformatics, 19, 2-9, (2003)
[32] Liò, P.; Vannucci, M., Wavelet change-point prediction of transmembrane proteins, Bioinformatics, 16, 376-382, (2000)
[33] Löytynoja, A.; Milinkovitch, M.C., A hidden Markov model for progressive multiple alignment, Bioinformatics, 19, 1505-1513, (2003)
[34] Mandell, A.J.; Selz, K.A.; Shlesinger, M.F., Wavelet transformation of protein hydrophobicity sequences suggests their membership in structural families, Physica A, 244, 254-262, (1997)
[35] Murray, K.B.; Gorse, D.; Thornton, J.M., Wavelet transforms for the characterization and detection of repeating motifs, J. mol. biol., 316, 341-363, (2002)
[36] Needleman, S.B.; Wunsch, C.D., A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. mol. biol., 48, 443-453, (1970)
[37] Notredame, C.; Higgins, D.G.; Heringa, J., T-coffee: a novel method for fast and accurate multiple sequence alignment, J. mol. biol., 302, 205-217, (2000)
[38] O’Sullivan, O.; Suhre, K.; Abergel, C.; Higgins, D.G.; Notredame, C., 3dcoffee: combining protein sequences and structures within multiple sequence alignments, J. mol. biol., 340, 385-395, (2004)
[39] Oyster, C.K.; Hanten, W.O.; Liorence, L.A., Introduction to research: a guide for the health science professional, (1987), Lippincott Oxford
[40] Pearson, W.R., Flexible sequence similarity searching with the FASTA3 program package, Methods mol biol., 132, 185-219, (2000)
[41] Pearson, W.R.; Lipman, D.J., Improved tools for biological sequence comparison, Proc. natl. acad. sci. U.S.A., 85, 2444-2448, (1988)
[42] Qiu, J.D.; Liang, R.P.; Zou, X.Y.; Mo, J.Y., Prediction of protein secondary structure based on continuous wavelet transform, Talanta, 61, 285-293, (2003)
[43] Reese, J.T.; Pearson, W.R., Empirical determination of effective gap penalties for sequence comparison, Bioinformatics, 18, 1500-1507, (2002)
[44] Rey, P.; Gillet, B.; Romer, S.; Eymery, F.; Massimino, J.; Peltier, G.; Kuntz, M., Over-expression of a pepper plastid lipid-associated protein in tobacco leads to changes in plastid ultrastructure and plant development upon stress, Plant J., 21, 483-494, (2000)
[45] Rost, B., Twilight zone of protein sequence alignments, Protein eng., 12, 58-94, (1999)
[46] Shao, X.G.; Leung, A.K.; Chau, F.T., Wavelet: a new trend in chemistry, Acc. chem. res., 36, 276-283, (2003)
[47] Smith, T.F.; Waterman, M.S., Identification of common molecular subsequences, J. mol. biol., 147, 195-197, (1981)
[48] Tyrrell, R.; Verschueren, K.H.; Dodson, E.J.; Murshudov, G.N.; Addy, C.; Wilkinson, A.J., The structure of the cofactor-binding fragment of the lysr family member, cysb: a familiar fold with a surprising subunit arrangement, Structure, 5, 1017-1032, (1997)
[49] Thompson, J.D.; Gibson, T.J.; Plewniak, F.; Jeanmougin, F.; Higgins, D.G., The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools, Nucleic acids res., 25, 4876-4882, (1997)
[50] Thompson, J.D.; Higgins, D.G.; Gibson, T.J., CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic acids res., 22, 4673-4680, (1994)
[51] Thompson, J.D.; Plewniak, F.; Poch, O., A comprehensive comparison of multiple sequence alignment programs, Nucleic acids res., 27, 2682-2690, (1999)
[52] Thompson, J.D.; Thierry, J.C.; Poch, O., RASCAL: rapid scanning and correction of multiple sequence alignments, Bioinformatics, 19, 1155-1161, (2003)
[53] Wheeler, D.L., Database resources of the national center for biotechnology information: update, Nucleic acids res., 32, 35-40, (2004)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.