Protein function classification via support vector machine approach. (English) Zbl 1021.92008

Summary: Support vector machine (SVM) is introduced as a method for the classification of proteins into functionally distinguished classes. Studies are conducted on a number of protein classes including RNA-binding proteins; protein homodimers, proteins responsible for drug absorption, proteins involved in drug distribution and excretion, and drug metabolizing enzymes. Testing accuracy for the classification of these protein classes is found to be in the range of \(84-96 \%\). This suggests the usefulness of SVM in the classification of protein functional classes and its potential application in protein function prediction.


92C40 Biochemistry, molecular biology
68T05 Learning and adaptive systems in artificial intelligence
68T10 Pattern recognition, speech recognition
Full Text: DOI


[1] Downward, J., The ins and outs of signalling, Nature, 411, 759 (2001)
[2] Lengeler, J. W., Metabolic networks: a signal-oriented approach to cellular models, Biol. Chem., 381, 911 (2000)
[3] Siomi, H.; Dreyfuss, G., RNA-binding proteins as regulators of gene expression, Curr. Opin. Genet. Dev., 7, 345 (1997)
[4] Draper, D. E., Themes in RNA-protein recognition, J. Mol. Biol., 293, 255 (1999)
[5] Koonin, E. V.; Tatusov, R. L.; Galperin, M. Y., Beyond complete genomes: from sequence to structure and function, Curr. Opin. Struc. Biol., 8, 355 (1998)
[6] Fetrow, J. S.; Skolnick, J., Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T-1 ribonucleases, J. Mol. Biol., 281, 949 (1998)
[7] Vapnik, V. N., The Nature of Statistical Learning Theory (1999), Springer: Springer New York · Zbl 0934.62009
[8] Burges, C. J.C., A tutorial on support vector machine for pattern recognition, Data Min. Knowl. Disc., 2, 121 (1998)
[9] Bock, J. R.; Gough, D. A., Predicting protein-protein interactions from primary structure, Bioinformatics, 17, 455 (2001)
[10] Ding, C. H.Q.; Dubchak, I., Multi-class protein fold recognition using support vector machines and neural networks, Bioinformatics, 17, 349 (2001)
[11] Cai, Y. D.; Liu, X. J.; Xu, X. B.; Chou, K. C., Support vector machines for the classification and prediction of β-turn types, J. Peptide Sci., 8, 297 (2002)
[12] Yuan, Z.; Burrage, K.; Mattick, J. S., Prediction of protein solvent accessibility using support vector machines, Proteins, 48, 566 (2002)
[13] Hua, S. J.; Sun, Z. R., A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach, J. Mol. Biol., 308, 397 (2001)
[14] Cai, Y. D.; Liu, X. J.; Xu, X. B.; Chou, K. C., Prediction of protein structural classes by support vector machines, Comput. Chem., 26, 293 (2002)
[15] de Vel, O.; Anderson, A.; Corney, M.; Mohay, G., Mining e-mail content for author identification forensics, Sigmod Record, 30, 55 (2001)
[16] Kim, K. I.; Jung, K.; Park, S. H.; Kim, H. J., Support vector machine-based text detection in digital video, Pattern Recogn., 34, 527 (2001)
[17] Drucker, H.; Wu, D. H.; Vapnik, V. N., Support vector machine for spam categorization, IEEE T. Neur. Network, 10, 1048 (1999)
[18] Tong, S.; Koller, D., Support vector machine active learning with applications to text classification, J. Mach. Learn. Res., 2, 45 (2001) · Zbl 1009.68131
[19] Li, Z. Y.; Tang, S. W.; Yan, S. C., Multi-class SVM classifier based on pairwise coupling, Lect. Notes Comput. Sci., 2388, 321 (2002) · Zbl 1064.68606
[20] Thubthong, N.; Kijsirikul, B., Support vector machines for Thai phoneme recognition, Int. J. Uncertain. Fuzz., 9, 803 (2001) · Zbl 1113.68474
[21] Gordan, M.; Kotropoulos, C.; Pitas, I., A temporal network of support vector machine classifiers for the recognition of visual speech, Lect. Notes Anal. Intell., 2308, 355 (2002) · Zbl 1065.68602
[22] Ben-Yacoub, S.; Abdeljaoued, Y.; Mayoraz, E., Fusion of face and speech data for person identity verification, IEEE Trans. Neural Network, 10, 1065 (1999)
[23] Wu, C. Y.; Liu, C.; Zhou, J., Eyeglasses verification by support vector machine, Lect. Notes Comput. Sci., 2195, 1126 (2001) · Zbl 1031.68943
[24] Wang, Y. J.; Chua, C. S.; Ho, Y. K., Facial feature detection and face recognition from 2D and 3D images, Pattern Recogn. Lett., 23, 1191 (2002) · Zbl 1016.68101
[25] Hsieh, J. W.; Huang, L. W.; Huang, Y. S., Multiple-person tracking system for content analysis, Lect. Notes Comput. Sci., 2195, 897 (2001) · Zbl 1031.68745
[26] Papageorgiou, C.; Poggio, T., A trainable system for object detection, Int. J. Comput. Vis., 38, 15 (2000) · Zbl 1012.68680
[27] Karlsen, R. E.; Gorsich, D. J.; Gerhart, G. R., Target classification via support vector machines, Opt. Eng., 39, 704 (2000)
[28] Zhao, Q.; Principe, J. C.; Brennan, V. L.; Xu, D. X.; Wang, Z., Synthetic aperture radar automatic target recognition with three strategies of learning and representation, Opt. Eng., 39, 1230 (2000)
[29] Gavrishchaka, V. V.; Ganguli, S. B., Support vector machine as an efficient tool for high-dimensional data processing: application to substorm forecasting, J. Geophys. Res., 106, 29911 (2001)
[30] Liong, S. Y.; Sivapragasam, C., Flood stage forecasting with support vector machines, J. Am. Water Resour. As., 38, 173 (2002)
[31] Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V., Gene selection for cancer classification using support vector machines, Mach. Learn., 46, 389 (2002) · Zbl 0998.68111
[32] Fritsche, H. A., Tumor markers and pattern recognition analysis: a new diagnostic tool for cancer, J. Clin. Ligand Assay, 25, 11 (2002)
[33] Bao, L.; Sun, Z. R., Identifying genes related to drug anticancer mechanisms using support vector machine, FEBS Lett., 521, 109 (2002)
[34] Ramaswamy, S.; Tamayo, P.; Rifkin, R., Multiclass cancer diagnosis using tumor gene expression signatures, Proc. Natl. Acad. Sci. USA, 98, 15149 (2001)
[35] Chan, K.; Lee, T. W.; Sample, P. A.; Goldbaum, M. H.; Weinreb, R. N.; Sejnowski, T. J., Comparison of machine learning and traditional classifiers in glaucoma diagnosis, IEEE Trans. Biomed. Eng., 49, 963 (2002)
[36] Furey, T. S.; Cristianini, N.; Duffy, N.; Bednarski, D. W.; Schummer, M.; Haussler, D., Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, 16, 906 (2000)
[37] Pavlidis, P.; Weston, J.; Cai, J. S.; Noble, W. S., Learning gene functional classifications from multiple data types, J. Comput. Biol., 9, 401 (2002)
[38] Brown, M. P.S.; Grundy, W. N.; Lin, D.; Cristianini, N.; Sugnet, C. W.; Furey, T. S.; Ares, M.; Haussler, D., Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc. Nat. Acad. Sci. USA, 97, 262 (2000)
[39] Burbidge, R.; Trotter, M.; Buxton, B.; Holden, S., Drug design by machine learning: support vector machines for pharmaceutical data analysis, Comput. Chem., 26, 5 (2001)
[40] Cai, Y. D.; Liu, X. J.; Xu, X. B.; Chou, K. C., Support vector machines for predicting HIV protease cleavage sites in protein, J. Comput. Chem., 23, 267 (2002)
[41] Cristianini, N.; Shawe-Taylor, J., An Introduction to Support Vector Machines (2000), Cambridge University: Cambridge University Cambridge
[42] Baldi, P.; Brunak, S.; Chauvin, Y.; Anderson, C. A.F.; Nielsen, H., Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics, 16, 412 (2000)
[43] Roulston, J. E., Screening with tumor markers, Mol. Biotechnol., 20, 153 (2002)
[44] C.Z. Cai, W.L. Wang, Y.Z. Chen, Support vector machine classification of physical and biological datasets, Int. J. Mod. Phys. C 14 (5) (2003) in press; C.Z. Cai, W.L. Wang, Y.Z. Chen, Support vector machine classification of physical and biological datasets, Int. J. Mod. Phys. C 14 (5) (2003) in press
[45] Jones, S.; Thornton, J. M., Principles of protein-protein interactions, Proc. Nat. Acad. Sci. USA, 93, 13 (1996)
[46] Sun, L. Z.; Ji, Z. L.; Chen, X.; Wang, J. F.; Chen, Y. Z., Absorption, distribution metabolism, and excretion-associated protein database, Clin. Pharmacol. Ther., 71, 405 (2002), Available: <http://xin.cz3.nus.edu.sg/group/admeap/admeap.asp>
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.