×

zbMATH — the first resource for mathematics

A two-stage SVM method to predict membrane protein types by incorporating amino acid classifications and physicochemical properties into a general form of Chou’s PseAAC. (English) Zbl 1412.92242
Summary: Membrane proteins play important roles in many biochemical processes and are also attractive targets of drug discovery for various diseases. The elucidation of membrane protein types provides clues for understanding the structure and function of proteins. Recently we developed a novel system for predicting protein subnuclear localizations. In this paper, we propose a simplified version of our system for predicting membrane protein types directly from primary protein structures, which incorporates amino acid classifications and physicochemical properties into a general form of pseudo-amino acid composition. In this simplified system, we will design a two-stage multi-class support vector machine combined with a two-step optimal feature selection process, which proves very effective in our experiments. The performance of the present method is evaluated on two benchmark datasets consisting of five types of membrane proteins. The overall accuracies of prediction for five types are 93.25% and 96.61% via the jackknife test and independent dataset test, respectively. These results indicate that our method is effective and valuable for predicting membrane protein types. A web server for the proposed method is available at http://www.juemengt.com/jcc/memty_page.ph.

MSC:
92D20 Protein sequences, DNA sequences
92-08 Computational methods for problems pertaining to biology
68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Alberts, B.; Bray, D.; Lewis, J.; Raff, M.; Roberts, K.; Watson, J. D., Molecular biology of the cell, (1994), Garland Publishing New York & London
[2] Alejandro, S.; Ernesto, P.; Segovia, L., Protein homology detection and fold inference through multiple alignment entropy profiles, Proteins, 70, 248-256, (2008)
[3] Basu, S.; Pan, A.; Dutta, C.; Das, J., Chaos game representation of proteins, J. Mol. Graph. Model., 15, 279-289, (1997)
[4] Blum, T.; Briesemeister, S.; Kohlbacher, O., Multiloc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction, BMC Bioinforma., 10, 274, (2009)
[5] Cai, Y. D.; Ricardo, P. W.; Jen, C. H.; Chou, K. C., Application of SVM to predict membrane protein types, J. Theor. Biol., 226, 373-376, (2004)
[6] Cai, Y. D.; Zhou, G. P.; Chou, K. C., Support vector machines for predicting membrane protein types by using functional domain composition, Biophys. J., 84, 3257-3263, (2003)
[7] Chang, C.C., Lin, C.J., 2001. LIBSVM: A Library for Support Vector Machines. 〈http://www.csie.ntu.edu.tw/ cjlin/papers/libsvm.pdf〉.
[8] Chen, L.; Zeng, W. M.; Cai, Y. D.; Feng, K. Y.; Chou, K. C., Predicting anatomical therapeutic chemical (ATC) classification of drugs by integrating chemical-chemical interactions and similarities, PLoS ONE, 7, e35254, (2012)
[9] Chen, W.; Feng, P. M.; Lin, H.; Chou, K. C., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, e69, (2013), Open access at http://dx.doi.org/10.1093/nar/gks1450
[10] Chen, Y. K.; Li, K. B., Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou’s pseudo amino acid composition, J. Theor. Biol., 318, 1-12, (2013) · Zbl 1406.92450
[11] Chou, K. C., A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space, Proteins, 21, 319-344, (1995)
[12] Chou, K. C., Prediction of protein subcellar locations by incorporating quasi-sequence-order effect, Biochem. Biophys. Res. Commun., 278, 477-483, (2000)
[13] Chou, K. C., Prediction of protein subcellular attributes using pseudo-amino acid composition, Proteins: Struct. Funct. Genet., 43, 246-255, (2001)
[14] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[15] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year review), J. Theor. Biol., 273, 236-247, (2011) · Zbl 1405.92212
[16] Chou, K. C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 1092-1100, (2013)
[17] Chou, K. C.; Cai, Y. D., Using GO-pseaa predictor to identify membrane proteins and their types, Biochem. Biophys. Res. Commun., 327, 845-847, (2005)
[18] Chou, K. C.; Elrod, D. W., Prediction of membrane protein types and subcellular location, Proteins: Struct. Funct. Genet., 34, 137-153, (1999)
[19] Chou, P. Y.; Fasman, G. D., Prediction of protein conformation, Biochemistry, 13, 222-245, (1974)
[20] Chou, K. C.; Shen, H. B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through psepssm, Biochem. Biophys. Res. Commun., 360, 339-345, (2007)
[21] Chou, K. C.; Shen, H. B., Review: recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 2, 63-92, (2009)
[22] Chou, K. C.; Wu, Z. C.; Xiao, X., Iloc-euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins, PLoS ONE, 6, e18258, (2011)
[23] Chou, K. C.; Wu, Z. C.; Xiao, X., Iloc-hum: using accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites, Mol. Biosyst., 8, 629-641, (2012)
[24] Dill, K. A., Theory for the folding and stability of globular proteins, Biochemistry, 24, 1501-1509, (1985)
[25] Dubchak, I.; Muchanikt, I.; Holbrook, S. R.; Kim, S. H., Prediction of protein folding class using global description of amino acid sequence, Proc. Natl. Acad. Sci., 92, 8700-8704, (1995)
[26] Feng, Z. P.; Keizer, D. W.; Stevenson, R. A.; Yao, S.; Babon, J. J.; Murphy, V. J.; Anders, R. F.; Norton, R. S., Structure and inter-domain interactions of domain II from the blood-stage malarial protein, apical membrane antigen 1, J. Mol. Biol., 350, 641-656, (2005)
[27] Feng, Z. P.; Zhang, X.; Han, P.; Arora, N.; Anders, R. F.; Norton, R. S., Abundance of intrinsically unstructured proteins in P. falciparum and other apicomplexan parasite proteomes, Mol. Biochem. Parasitol., 150, 256-267, (2006)
[28] Gao, Q. B.; Ye, X. F.; Jin, Z. C.; He, J., Improving discrimination of outer membrane proteins by fusing different forms of pseudo amino acid composition, Anal. Biochem., 398, 52-59, (2010)
[29] Han, G. S.; Yu, Z. G.; Anh, V., Predicting the subcellular location of apoptosis proteins based on recurrence quantification analysis and the Hilbert-huang transform, Chin. Phys. B, 20, 100504, (2011)
[30] Han, G. S.; Yu, Z. G.; Anh, V.; Krishnajith, A. P.D.; Tian, Y. C., An ensemble method for predicting subnuclear localizations from primary protein structures, PLoS ONE, 8, 2, e57225, (2013)
[31] Hayat, M.; Khan, A., Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition, J. Theor. Biol., 271, 10-17, (2011) · Zbl 1405.92217
[32] Hayat, M.; Khan, A., Memhyb: predicting membrane protein types by hybridizing SAAC and PSSM, J. Theor. Biol., 292, 93-102, (2012) · Zbl 1307.92308
[33] Hayat, M.; Khan, A., Discriminating outer membrane proteins with fuzzy K-nearest neighbor algorithms based on the general form of Chou’s pseaac, Protein Pept. Lett., 19, 411-421, (2012)
[34] Höglund, A.; Dönnes, P.; Blum, T.; Adolph, H. W.; Kohlbacher, O., Multiloc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition, Bioinformatics, 22, 1158-1165, (2006)
[35] Huang, C.; Yuan, J. Q., A multilabel model based on Chou’s pseudo-amino acid composition for identifying membrane proteins with both single and multiple functional types, J. Membr. Biol., 246, 327-334, (2013)
[36] Huang, N. E.; Shen, Z.; Long, S. R.; Wu, M. C.; Shih, S. H.; Zheng, Q.; Yen, N. C.; Tung, C. C.; Liu, H. H., The empirical mode decomposition and the Hilbert spectrum for nonlinear and nonstationary time series analysis, Proc. R. Soc. A, 454, 903-995, (1998) · Zbl 0945.62093
[37] Huang, T.; Shi, X. H.; Wang, P.; He, Z. S.; Feng, K. Y.; Hu, L. L.; Kong, X. Y.; Li, Y. X.; Cai, Y. D.; Chou, K. C., Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks, PLoS ONE, 5, e10972, (2010)
[38] Kawashima, S.; Kanehisa, M., Aaindex: amino acid index database, Nucleic Acids Res., 28, (2000), 374-374
[39] Lempel, A.; Ziv, J., On the complexity of finite sequence, IEEE Trans. Inf. Theory., 22, 75-81, (1976) · Zbl 0337.94013
[40] Li, Z. R.; Lin, H. H.; Han, L. Y.; Jiang, L.; Chen, X.; Chen, Y. Z., PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence, Nucleic Acids Res., 34, W32-W37, (2008)
[41] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition, J. Theor. Biol., 252, 350-356, (2008) · Zbl 1398.92076
[42] Lin, S. X.; Lapointe, J., Theoretical and experimental biology in one, J. Biomed. Sci. Eng., 6, 435-442, (2013)
[43] Lin, W. Z.; Fang, J. A.; Xiao, X.; Chou, K. C., Iloc-animal: a multi-label learning classifier for predicting subcellular localization of animal proteins, Mol. Biosyst., 9, 634-644, (2013)
[44] Liu, H.; Wang, M.; Chou, K. C., Low-frequency Fourier spectrum for predicting membrane protein types, Biochem. Biophys. Res. Commun., 336, 737-739, (2005)
[45] Lodish, H.; Baltimore, D.; Berk, A.; Zipursky, S. L.; Matsudaira, P.; Darnell, J., Molecular cell biology, (1995), Scientific American Books New York
[46] Mahdavi, A.; Jahandideh, S., Application of density similarities to predict membrane protein types based on pseudo-amino acid composition, J. Theor. Biol., 276, 132-137, (2011) · Zbl 1405.92218
[47] Murphy, L. R.; Wallqvist, A.; Levy, R. M., Simplified amino acid alphabets for protein fold recognition and implications for folding, Protein Eng., 13, 149-152, (2000)
[48] Nanni, L.; Lumini, A., An ensemble of support vector machines for predicting the membrane protein type directly from the amino acid sequence, Amino Acids, 35, 573-580, (2008)
[49] Peng, H.; Long, F.; Ding, C., Feature selection based on mutual information: criteria of MAX-dependency, MAX-relevance, and MIN-redundancy, IEEE Trans. Pattern Anal. Mach. Intell., 27, 1226-1238, (2005)
[50] Peng, Z. L.; Yang, J. Y.; Chen, X., An improved classification of G-proteincoupled receptors using sequence-derived features, BMC Bioinformatics, 11, 420, (2010)
[51] Platt, J. C.; Cristianini, N.; Shawe-Taylor, J., Large margin DAGs for multiclass classification, Adv. Neural Inf. Process. Syst., 12, 547-553, (2000)
[52] Pu, X.; Guo, J.; Leung, H.; Lin, Y. L., Prediction of membrane protein types from sequences and position-specific scoring matrices, J. Theor. Biol., 247, 259-265, (2007)
[53] Qiu, J. D.; Sun, X. U.; Huang, J. H.; Liang, R. P., Prediction of the types of membrane proteins based on discrete wavelet transform and support vector machines, Protein J., 29, 114-119, (2010)
[54] Rezaei, M. A.; Maleki, P. A.; Karami, Z.; Asadabadi, E. B.; Sherafat, M. A.; Moghaddam, K. A.; Fadaie, M.; Forouzanfar, M., Prediction of membrane protein types by means of wavelet analysis and cascaded neural network, J. Theor. Biol., 255, 817-820, (2008)
[55] Sanders, P. R.; Kats, L. M.; Drew, D. R.; O’Donnell, R. A.; O’Neill, M.; Maier, A. G.; Coppel, R. L.; Crabb, B. S., A set of glycosylphosphatidyl inositol-anchored membrane proteins of plasmodium falciparum is refractory to genetic deletion, Infect. Immun., 74, 4330-4338, (2006)
[56] Shen, H. B.; Chou, K. C., Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo amino acid composition to predict membrane protein types, Biochem. Biophys. Res. Commun., 334, 288-292, (2005)
[57] Shen, H. B.; Chou, K. C., Using ensemble classifier to identify membrane protein types, Amino Acids, 32, 483-488, (2007)
[58] Shen, H. B.; Yang, J.; Chou, K. C., Fuzzy KNN for predicting membrane protein types from pseudo amino acid composition, J. Theor. Biol., 240, 9-13, (2006)
[59] Shen, J.; Zhang, J.; Luo, X.; Zhu, W.; Yu, K.; Chen, K.; Li, Y.; Jiang, H., Predicting protein-protein interactions based only on sequences information, Proc. Natl. Acad. Sci., 104, 4337-4341, (2007)
[60] Tusnady, G. E.; Dosztanyi, Z.; Simon, I., Transmembrane proteins in the protein databank: identification and classification, Bioinformatics, 20, 2964-2972, (2004)
[61] Vapnik, V. N., The nature of statistical learning theory, (1995), Springer · Zbl 0833.62008
[62] Wang, J. Y.; Li, Y. P.; Wang, Q. Q.; You, X. G.; Man, J. J.; Wang, C.; Gao, X., Proclusensem: predicting membrane protein types by fusing different modes of pseudo amino acid composition, Comput. Biol. Med., 42, 564-574, (2012)
[63] Wang, L.; Yuan, Z.; Chen, X.; Zhou, Z., The prediction of membrane protein types with NPE, IEICE Electron. Express, 7, 397-402, (2010)
[64] Wang, M.; Yang, J.; Liu, G. P.; Xu, Z. J.; Chou, K. C., Weighted-support vector machines for predicting membrane protein types based on pseudo amino acid composition, Protein Eng. Des. Sel., 17, 509-516, (2004)
[65] Wang, M.; Yang, J.; Xu, Z. J.; Chou, K. C., SLLE for predicting membrane protein types, J. Theor. Biol., 232, 7-15, (2005)
[66] Wang, S. Q.; Yang, J.; Chou, K. C., Using stacking generalization to predict membrane protein types based on pseudo amino acid composition, J. Theor. Biol., 242, 941-946, (2006)
[67] Xiao, X.; Min, J. L.; Wang, P.; Chou, K. C., Igpcr-drug: a web server for predicting interaction between GPCRs and drugs in cellular networking, PLoS ONE, 8, e72234, (2013)
[68] Xiao, X.; Wang, P.; Lin, W. Z.; Jia, J. H.; Chou, K. C., Iamp-2la two-level multi-label classifier for identifying antimicrobial peptides and their functional types, Anal. Biochem., 436, 168-177, (2013)
[69] Xu, Y.; Ding, J.; Wu, L. Y.; Chou, K. C., Isno-pseaac: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition, PLoS ONE, 8, e55844, (2013)
[70] Yang, X. G.; Luo, R. Y.; Feng, Z. P., Using amino acid and peptide composition to predict membrane protein types, Biochem. Biophys. Res. Commun., 353, 164-169, (2007)
[71] Yang, J. Y.; Zhou, Y.; Yu, Z. G.; Anh, V.; Zhou, L. Q., Human Pol II promoter recognition based on primary sequences and free energy of dinucleotides, BMC Bioinformatics, 9, 11, (2008)
[72] Yu, Z. G.; Anh, V.; Lau, K. S., Fractal analysis of measure representation of large proteins based on the detailed HP model, Physica A, 337, 171-184, (2004)
[73] Yu, Z. G.; Anh, V.; Lau, K. S., Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses, J. Theor. Bol., 226, 341-348, (2004)
[74] Yu, Z. G.; Anh, V.; Wang, Y.; Mao, D.; Wanliss, J., Modelling and simulation of the horizontal component of the geomagnetic field by fractional stochastic differential equations in conjunction with empirical mode decomposition, J. Geophys. Res., 115, A10219, (2010)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.