×

Classifying G-protein coupled receptors with bagging classification tree. (English) Zbl 1089.92009

Summary: G-protein coupled receptors (GPCRs) play a key role in different biological processes, such as regulation of growth, death and metabolism of cells. They are major therapeutic targets of numerous prescribed drugs. However, the ligand specificity of many receptors is unknown and there is little structural information available. Bioinformatics may offer one approach to bridge the gap between sequence data and functional knowledge of a receptor.
We use a bagging classification tree algorithm to predict the type of the receptor based on its amino acid composition. The prediction is performed for GPCR at the sub-family and sub-sub-family level. In a cross-validation test, we achieved an overall predictive accuracy of 91.1% for GPCR sub-family classification, and 82.4% for sub-sub-family classification. These results demonstrate the applicability of this relative simple method and its potential for improving prediction accuracy.

MSC:

92C40 Biochemistry, molecular biology
92-08 Computational methods for problems pertaining to biology

Software:

C4.5
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Berry, E. A.; Dalby, A. R.; Yang, Z. R., Reduced bio basis function neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms, Comput. Biol. Chem., 28, 75-85 (2004) · Zbl 1087.92019
[2] Breiman, L., Bagging predictors, Mach. Learn., 24, 123-140 (1996) · Zbl 0858.68080
[3] Breiman, L., Arcing classifiers, Ann. Stat., 26, 801-824 (1998) · Zbl 0934.62064
[4] Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J., Classification and Regression Trees (1984), Chapman & Hall/CRC: Chapman & Hall/CRC Washington, DC · Zbl 0541.62042
[5] Brocchieri, L., Environmental signatures in proteome properties, Proc. Natl. Acad. Sci. U.S.A, 101, 8257-8258 (2004)
[6] Chou, K. C., A novel approach to predicting protein structural classes in a (20-1)-d amino acid composition space, Proteins Struct. Funct. Genet., 21, 319-344 (1995)
[7] Chou, K. C.; Elrod, D. W., Protein subcellular location prediction, Protein Eng., 12, 107-118 (1999)
[8] Chou, K. C.; Elrod, D. W., Prediction of enzyme family classes, J. Proteome Res., 2, 183-190 (2003)
[9] Chou, K. C.; Zhang, C. T., Review: prediction of protein structural classes, Crit. Rev. Biochem. Mol. Biol., 30, 275-349 (1995)
[10] Dubchak, I.; Holbrook, S.; Kim, S.-H., Prediction of protein folding class from amino acid composition, Proteins Struct. Funct. Genet., 16, 79-91 (1993)
[11] Dudoit, S.; Fridlyand, J.; Speed, T. P., Comparison of discrimination methods for the classification of tumors using gene expression data, J. Am. Stat. Assoc., 97, 77-87 (2002) · Zbl 1073.62576
[12] Elrod, D. W.; Chou, K. C., A study on the correlation of G-protein coupled receptor types with amino acid composition, Protein Eng., 15, 713-715 (2002)
[13] Gaulton, A.; Attwood, T. K., Bioinformatics approaches for the classification of G-protein coupled receptors, Curr. Opin. Pharmacol., 3, 114-120 (2003)
[14] Horn, F.; Bettler, E.; Oliveira, L.; Campagne, F.; Cohen, F. E.; Vriend, G., GPCRDB information system for G protein coupled receptors, Nucleic Acids Res., 31, 294-297 (2003)
[15] Karchin, R.; Karplus, K.; Haussler, D., Classifying G-protein coupled receptors with support vector machines, Bioinformatics, 18, 147-159 (2002)
[16] Kretschmann, E.; Fleischmann, W.; Apweiler, R., Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT, Bioinformatics, 17, 920-926 (2001)
[17] Li, W.; Jaroszewski, L.; Godzik, A., Clustering of highly homologous sequences to reduce the size of large protein databases, Bioinformatics, 17, 282-283 (2001)
[18] Li, W.; Jaroszewski, L.; Godzik, A., Tolerating some redundancy significantly speeds up clustering of large protein databases, Bioinformatics, 18, 77-82 (2002)
[19] Matthews, B. W., Comparison of predicted and observed secondary structure of T4 phage lysozyme, Biochim. Biophys. Acta, 405, 442-451 (1975)
[20] Muggleton, S. H.; Bryant, C. H.; Srinivasan, A.; Whittaker, A.; Topp, S.; Rawlings, C., Are grammatical representations useful for learning from biological sequence data? A case study, J. Comput. Biol., 8, 493-521 (2001)
[21] Nakai, K., Review: prediction of in vivo fates of proteins in the era of genomics and proteomics, J. Struct. Biol., 134, 103-116 (2001)
[22] Nakashima, H.; Nishikawa, K.; Ooi, T., The folding type of a protein is relevant to the amino acid composition, J. Biochem. (Tokyo), 99, 153-162 (1986)
[23] Palczewski, K.; Kumasaka, T.; Hori, T.; Behnke, C. A.; Motoshima, H.; Fox, B. A.; Le Trong, I.; Teller, D. C.; Okada, T.; Stenkamp, R. E.; Yamamoto, M.; Miyano, M., Crystal structure of rhodopsin: a G-protein coupled receptor, Science, 289, 739-745 (2000)
[24] Qian, B.; Soyer, O. S.; Neubig, R. R.; Goldstein, R. A., Depicting a protein’s two faces: GPCR classification by phylogenetic tree-based HMMs, FEBS Lett., 554, 95 (2003)
[25] Quinlan, J. R., C4.5: Programs for Machine Learning (1993), Morgan Kaufmann Publishers: Morgan Kaufmann Publishers San Mateo, CA
[26] Reinhardt, A.; Hubbard, T., Using neural networks for prediction of the subcellular location of proteins, Nucleic Acids Res., 26, 2230-2236 (1998)
[27] Salzberg, S.; Delcher, A. L.; Fasman, K. H.; Henderson, J., A decision tree system for finding genes in DNA, J. Comput. Biol., 5, 667-680 (1998)
[28] Schoneberg, T.; Schulz, A.; Gudermann, T., The structural basis of G-protein coupled receptor function and dysfunction in human diseases, Rev. Physiol. Biochem. Pharmacol., 144, 143-227 (2002)
[29] Shannon, C. E., The mathematical theory of communication, AT&T Tech. J., 27, 379-423 (1948) · Zbl 1154.94303
[30] Witten, I. H.; Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (1999), Morgan Kaufmann: Morgan Kaufmann San Francisco
[31] Zhang, H.; Yu, C. Y.; Singer, B.; Xiong, M., Recursive partitioning for tumor classification with gene expression microarray data, Proc. Natl. Acad. Sci. U.S.A, 98, 6730-6735 (2001) · Zbl 1199.81015
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.