zbMATH — the first resource for mathematics

On using physico-chemical properties of amino acids in string kernels for protein classification via support vector machines. (English) Zbl 1334.92310
Summary: String kernels are popular tools for analyzing protein sequence data and they have been successfully applied to many computational biology problems. The traditional string kernels assume that different substrings are independent. However, substrings can be highly correlated due to their substructure relationship or common physico-chemical properties. This paper proposes two kinds of weighted spectrum kernels: The correlation spectrum kernel and the AA spectrum kernel. We evaluate their performances by predicting glycan-binding proteins of 12 glycans. The results show that the correlation spectrum kernel and the AA spectrum kernel perform significantly better than the spectrum kernel for nearly all the 12 glycans. By comparing the predictive power of AA spectrum kernels constructed by different physico-chemical properties, the authors can also identify the physicochemical properties which contributes the most to the glycan-protein binding. The results indicate that physico-chemical properties of amino acids in proteins play an important role in the mechanism of glycan-protein binding.
92D20 Protein sequences, DNA sequences
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI
[1] Leslie, C; Eskin, E; Noble, W S, The spectrum kernel: A string kernel for svm protein classification, Proceedings of the Pacific Biocomputing Symposium, 7, 566-575, (2002)
[2] Leslie, C; Eskin, E; Weston, J; Noble, W S, Mismatch string kernels for discriminative protein classification, Bioinformatics, 20, 467-476, (2003)
[3] Rätsch, G; Sonnenburg, S; Srinivasan, J; Witte, H; Müller, K; Sommer, R; Schölkopf, B, Improving the caenorhabditis elegans genome annotation using machine learning, PLoS Computational Biology, 3, e20, (2007)
[4] Schweikert, G; Zien, A; Zeller, G; Behr, J; Dieterich, C; Ong, C; Philips, P; Bona, F; Hartmann, L; Bohlen, A; Krger, N; Sonnenburg, S; Ratsch, G, Mgene: accurate svm-based gene finding with an application to nematode genomes, Genome Res., 19, 2133-2143, (2009)
[5] Schultheiss, S; Busch, W; Lohmann, J; Kohlbacher, O; Rätsch, G, Kirmes: kernel-based identification of regulatory modules in euchromatic sequences, Bioinformatics, 25, 2126-2133, (2009)
[6] Roth, V; Fischer, B, Improved functional prediction of proteins by learning kernel combinations in multilabel settings, BMC Bioinformatics, 8, s12, (2007)
[7] Ong, C; Zien, A, An automated combination of kernels for predicting protein subcellular localization, 168-179, (2008)
[8] Röttig, M; Rausch, C; Kohlbacher, O, Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families, PLoS Computational Biology, 6, e1000636, (2010)
[9] Someya S, Kakuta M, Morita M, Sumikoshi K, Cao W, Ge Z, Hirose O, Nakamura S, Terada T, and Shimizu K, Prediction of carbohydrate-binding proteins from sequences using support vector machines, Advances in Bioinformatics, 2010, 1, DOI: 10.1155/2010/289301. · Zbl 1219.92028
[10] Jin, Y T B; Zhang, Y, Support vector machines with genetic fuzzy feature transformation for biomedical data classification, 476-489, (2007)
[11] Vapnik V N, The Nature of Statistical Learning Theory, Springer, New York, 1995. · Zbl 0833.62008
[12] Noble, W, What is a support vector machine?, Nat Biotech, 24, 1565-1567, (2006) · Zbl 1167.83321
[13] Li, L; Ching, W; Chan, Y; Mamitsuka, H, On network-based kernel methods for protein-protein interactions with applications in protein functions prediction, Journal of Systems Science and Complexity, 23, 917-930, (2010) · Zbl 1209.92018
[14] Argos, J R A; Hargrave, P, Structural prediction of membrane-bound proteins, International Journal of Peptide and Protein Research, 128, 565-575, (1982)
[15] Toussaint, N C; Widmer, C; Kohlbacher, O; Rätsch, G, Exploiting physico-chemical properties in string kernels, BMC Bioinformatics, 11, s7, (2010)
[16] Jiang, H; Ching, W; Zheng, Z, Kernel techniques in support vector machines for classification of biological data, International Journal of Information Technology and Computer Science, 2, 1-8, (2011)
[17] Vapnik, V; Chervonenkis, A, Theory of pattern recognition [in russian], nauka, Moscow, 1974, (German translation: wapnik W and tscherwonenkis A), (1979), Berlin
[18] Schölkopf B and Smola A J, Learning with Kernels, MIT Press, Cambridge, MA, 2002.
[19] Schölkopf B, Tsuda K, and Vert J P, Kernel Methods in Computational Biology, MIT Press, Cambridge, Massachusetts, 2004.
[20] Cortes, C; Vapnik, V, Support vector networks, Machine Learning, 20, 273-297, (1995) · Zbl 0831.68098
[21] Kuhn, H W; Tucker, A W, Nonlinear programming, 481-492, (1951), Berkeley · Zbl 0044.05903
[22] Varki A, Cummings R, Esko J, Freeze H, Hart G, and Etzler M E, Essentials of Glycobiology, 2nd Edition, Cold Spring Harbor Laboratory Press, New York, 2008.
[23] Feizi, T; Fazio, F; Chai, W; Wong, C, Carbohydrate microarrays — A new set of technologies at the frontiers of glycomics, Curr. Opin. Struct. Biol., 13, 637-645, (2003)
[24] Paulson, J C; Blixt, O; Collins, B E, Sweet spots in functional glycomics, Nat. Chem. Biol., 2, 238-248, (2006)
[25] Oyelaran, O; Gildersleeve, J C, Glycan arrays: recent advances and future challenges, Curr. Opin. Chem. Biol., 13, 406-413, (2009)
[26] Kawashima, S; Kanehisa, M, Aaindex: amino acid index database, Nucleic Acids Res., 28, 374, (2000)
[27] Kanehisa, M; Goto, S; Hattori, M; Aoki-Kinoshita, K; Itoh, M; Kawashima, S; Katayama, T; Araki, M; Hirakawa, M, From genomics to chemical genomics: new developments in kegg, Nucleic Acids Res., 34, 354-357, (2006)
[28] Chang C C and Lin C J, Libsvm: A library for support vector machines, http://www.csie.ntu.edu.tw/ cjlin/libsvm.
[29] Hisamatsu, K; Tsuda, N; Goda, S; Hatakeyama, T, Characterization of the alpha-helix region in domain 3 of the haemolytic lectin cel-iii: implications for self-oligomerization and haemolytic processes, J. Biochem., 143, 79-86, (2008)
[30] Chandra, N R; Prabu, M M; Suguna, K; Vijayan, M, Structural similarity and functional diversity in proteins containing the legume lectin fold, Protein Engineering, 14, 857-866, (2001)
[31] Hamelryck, T W; Loris, R; Bouckaert, J; Wyns, L, Structural features of the legume lectins, Trends in Glycoscience and Glycotechnology, 10, 349-360, (1998)
[32] Hester, G; Kaku, H; Goldstein, I J; Wright, C S, Structure of mannose-specific snowdrop (galanthus nivalis) lectin is representative of a new plant lectin family, Nature Structural Biology, 2, 472-479, (1995)
[33] Sharon N and Lisi H, Lectins, Springer, 2nd edition, Dordrecht, The Netherlands, 2003.
[34] Wright, L M; Damme, E J M V; Barre, A; etal., Isolation, characterization, molecular cloning and molecular modelling of two lectins of different specificities from bluebell (scilla campanulata) bulbs, Biochemical Journal, 340, 299-308, (1999)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.