zbMATH — the first resource for mathematics

SecretP: identifying bacterial secreted proteins by fusing new features into Chou’s pseudo-amino acid composition. (English) Zbl 1410.92040
Summary: Protein secretion plays an important role in bacterial lifestyles. Secreted proteins are crucial for bacterial pathogenesis by making bacteria interact with their environments, particularly delivering pathogenic and symbiotic bacteria into their eukaryotic hosts. Therefore, identification of bacterial secreted proteins becomes an important process for the study of various diseases and the corresponding drugs. In this paper, fusing several new features into Chou’s pseudo-amino acid composition (PseAAC), two support vector machine (SVM)-based ternary classifiers are developed to predict secreted proteins of Gram-negative and Gram-positive bacteria. For the two types of bacteria, the high accuracy of 94.03% and 94.36% are obtained in distinguishing classically secreted, non-classically secreted and non-secreted proteins by our method. In order to compare the practical ability of our method in identifying bacterial secreted proteins with those of six published methods, proteins in Escherichia coli and Bacillus subtilis are collected to construct the test sets of Gram-negative and Gram-positive bacteria, and the prediction results of our method are comparable to those of existing methods. When performed on two public independent data sets for predicting NCSPs, it also yields satisfactory results for Gram-negative bacterial proteins. The prediction server SecretP can be accessed at http://cic.scu.edu.cn/bioinformatics/secretPV2/index.htm.
Reviewer: Reviewer (Berlin)

92C40 Biochemistry, molecular biology
62P10 Applications of statistics to biology and medical sciences; meta analysis
92C37 Cell biology
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI
[1] Altschul, S.F.; Madden, T.L.; Schaffer, A.A.; Zhang, J.H.; Zhang, Z.; Miller, W.; Lipman, D.J.; BLAST, Gapped, PSI-BLAST: a new generation of protein database search programs, Nucleic acids res., 25, 3389-3402, (1997)
[2] Bendtsen, J.D.; Jensen, L.J.; Blom, N.; von Heijne, G.; Brunak, S., Feature-based prediction of non-classical and leaderless protein secretion, Protein eng. des. sel., 17, 349-356, (2004)
[3] Bendtsen, J.D.; Kiemer, L.; Fausboll, A.; Brunak, S., Non-classical protein secretion in bacteria, BMC microbiol., 5, 58-70, (2005)
[4] Bendtsen, J.D.; Nielsen, H.; von Heijne, G.; Brunak, S., Improved prediction of signal peptides: signalp 3.0, J. mol. biol., 340, 783-795, (2004)
[5] Brockmeier, U.; Caspers, M.; Freudl, R.; Jockwer, A.; Noll, T.; Eggert, T., Systematic screening of all signal peptides from bacillus subtilis: a powerful strategy in optimizing heterologous protein secretion in Gram-positive bacteria, J. mol. biol., 362, 393-402, (2006)
[6] Bull, H.B.; Breese, K., Surface tension of amino acid solutions: a hydrophobicity scale of the amino acid residues, Arch. biochem. biophys., 161, 665-670, (1974)
[7] Buttner, D.; Bonas, U., Common infection strategies of plant and animal pathogenic bacteria, Curr. opinion plant biol., 6, 312-319, (2003)
[8] Cai, Y.D.; Feng, K.Y.; Li, Y.X.; Chou, K.C., Support vector machine for predicting alpha-turn types, Peptides, 24, 629-630, (2003)
[9] Cai, Y.D.; Lin, S.L.; Chou, K.C., Support vector machines for prediction of protein signal sequences and their cleavage sites, Peptides, 24, 159-161, (2003)
[10] Cai, Y.D.; Liu, X.J.; Xu, X.B.; Chou, K.C., Support vector machines for predicting HIV protease cleavage sites in protein, J. comput. chem., 23, 267-274, (2002)
[11] Cai, Y.D.; Liu, X.J.; Xu, X.B.; Chou, K.C., Support vector machines for the classification and prediction of beta-turn types, J. pept. sci., 8, 297-301, (2002)
[12] Cai, Y.D.; Liu, X.J.; Xu, X.B.; Chou, K.C., Prediction of protein structural classes by support vector machines, Comput. chem., 26, 293-296, (2002)
[13] Cai, Y.D.; Liu, X.J.; Xu, X.B.; Chou, K.C., Support vector machines for predicting the specificity of galnac-transferase, Peptides, 23, 205-208, (2002)
[14] Cai, Y.D.; Ricardo, P.W.; Jen, C.H.; Chou, K.C., Application of SVM to predict membrane protein types, J. theor. biol., 226, 373-376, (2004)
[15] Cai, Y.D.; Zhou, G.P.; Chou, K.C., Support vector machines for predicting membrane protein types by using functional domain composition, Biophys. J., 84, 3257-3263, (2003)
[16] Cai, Y.D.; Zhou, G.P.; Jen, C.H.; Lin, S.L.; Chou, K.C., Identify catalytic triads of serine hydrolases by support vector machines, J. theor. biol., 228, 551-557, (2004)
[17] Cambronne, E.D.; Roy, C.R., Recognition and delivery of effector proteins into eukariotic cells by bacterial secretion systems, Traffic, 7, 929-939, (2006)
[18] Chou, K.C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43, 246-255, (2001)
[19] Chou, K.C., Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology, Curr. proteomics, 6, 262-274, (2009)
[20] Chou, K.C.; Cai, Y.D., Using functional domain composition and support vector machines for prediction of protein subcellular location, J. biol. chem., 277, 45765-45769, (2002)
[21] Chou, K.C.; Shen, H.B., Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides, Biochem. biophys. res. commun., 357, 633-640, (2007)
[22] Chou, K.C.; Shen, H.B., Recent progress in protein subcellular location prediction, Anal. biochem., 370, 1-16, (2007)
[23] Chou, K.C.; Shen, H.B., A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: euk-mploc 2.0, Plos one, 5, e9931, (2010)
[24] Chou, K.C.; Zhang, C.T., Prediction of protein structural classes, Crit. rev. biochem. mol. biol., 30, 275-349, (1995)
[25] Desvaux, M.; Hebraud, M., The protein secretion systems in listeria: inside out bacterial virulence, FEMS microbiol. rev., 30, 774-805, (2006)
[26] Desvaux, M.; Hebraud, M.; Talon, R.; Henderson, I.R., Secretion and subcellular localizations of bacterial proteins: a semantic awareness issue, Trends microbiol., 17, 139-145, (2009)
[27] Desvaux, M.; Khan, A.; Beatson, S.A.; Scott-Tucker, A.; Henderson, I.R., Protein secretion systems in fusobacterium nucleatum: genomic identification of type 4 piliation and complete type V pathways brings new insight into mechanisms of pathogenesis, Biochim. biophys. acta, 1713, 92-112, (2005)
[28] Ding, Y.S.; Zhang, T.L.; Chou, K.C., Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network, Protein pept. lett., 14, 811-815, (2007)
[29] Economou, A.; Christie, P.J.; Fernandez, R.C.; Palmer, T.; Plano, G.V.; Pugsley, A.P., Secretion by numbers: protein traffic in prokaryotes, Mol. microbiol., 62, 308-319, (2006)
[30] Eisenhaber, F.; Imperiale, F.; Argos, P.; Frommel, C., Prediction of secondary structural content of proteins from their amino acid composition alone. 1. new analytic vector decomposition methods, Proteins, 25, 157-168, (1996)
[31] Esmaeili, M.; Mohabatkar, H.; Mohsenzadeh, S., Using the concept of chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses, J. theor. biol., 263, 203-209, (2010) · Zbl 1406.92455
[32] Gardy, J.L.; Brinkman, F.S.L., Methods for predicting bacterial protein subcellular localization, Nat. rev. microbiol., 4, 741-751, (2006)
[33] Gardy, J.L.; Laird, M.R.; Chen, F.; Rey, S.; Walsh, C.J.; Ester, M.; Brinkman, F.S.L., Psortb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis, Bioinformatics, 21, 617-623, (2005)
[34] Gardy, J.L.; Spencer, C.; Wang, K.; Ester, M.; Tusnady, G.E.; Simon, I.; Hua, S.; deFays, K.; Lambert, C.; Nakai, K.; Brinkman, F.S.L., PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria, Nucleic acids res., 31, 3613-3617, (2003)
[35] Gasteiger, E.; Gattiker, A.; Hoogland, C.; Ivanyi, I.; Appel, R.D.; Bairoch, A., Expasy: the proteomics server for in-depth protein knowledge and analysis, Nucleic acids res., 31, 3784-3788, (2003)
[36] Gerlach, R.G.; Hensel, M., Protein secretion systems and adhesins: the molecular armory of Gram-negative pathogens, Int. J. med. microbiol., 297, 401-415, (2007)
[37] Gerlach, R.G.; Hensel, M., Protein secretion systems and adhesins: the molecular armory of Gram-negative pathogens, Int. J. med. microbiol., 297, 401-415, (2007)
[38] Grantham, R., Amino acid difference formula to help explain protein evolution, Science, 185, 862-864, (1974)
[39] Guiral, S.; Mitchell, T.J.; Martin, B.; Claverys, J.P., Competence-programmed predation of noncompletent cells in the human pathogen streptococcus pneumoniae: genetic requirements, Proc. natl. acad. sci. USA, 102, 8710-8715, (2005)
[40] Guo, Y.Z.; Yu, L.Z.; Wen, Z.N.; Li, M.L., Using support vector machine combined with auto covariance to predict protein – protein interactions from protein sequences, Nucleic acids res., 36, 3025-3030, (2008)
[41] He, Z.S.; Zhang, J.; Shi, X.H.; Hu, L.L.; Kong, X.G.; Cai, Y.D.; Chou, K.C., Predicting drug – target interaction networks based on functional groups and biological features, Plos one, 5, e9603, (2010)
[42] Hirose, I.; Sano, K.; Shioda, I.; Kumano, M.; Nakamura, K.; Yamane, K., Proteome analysis of bacillus subtilis extracellular proteins: a two-dimensional protein electrophoretic study, Microbiology, 146, 65-75, (2000)
[43] Holland, I.B., Translocation of bacterial proteins—an overview, Biochim. biophys. acta, 1694, 5-16, (2004)
[44] Hopp, T.P.; Woods, K.R., Prediction of protein antigenic determinants from amino acid sequences, Proc. natl. acad. sci. USA., 78, 3824-3828, (1981)
[45] Hua, S.; Sun, Z., Support vector machine approach for protein subcellular localization prediction, Bioinformatics, 17, 721-728, (2001)
[46] Huang, L.J.; Chen, S.X.; Huang, Y.; Luo, W.J.; Jiang, H.H.; Hu, Q.H.; Zhang, P.F.; Yi, H., Proteomics-based identification of secreted protein dihydrodiol dehydrogenase as a novel serum markers of non-small cell lung cancer, Lung cancer, 54, 87-94, (2006)
[47] Huang, T.; Shi, X.H.; Wang, P.; He, Z.S.; Feng, K.Y.; Hu, L.L.; Kong, X.G.; Li, Y.X.; Cai, Y.D.; Chou, K.C., Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks, Plos one, 5, e10972, (2010)
[48] Journet, L.; Hughes, K.T.; Cornelis, G.R., Type III secretion: a secretory pathway serving both motility and virulence (review), Mol. membr. biol., 22, 41-50, (2005)
[49] Kall, L.; Krogh, A.; Sonnhammer, E.L.L., A combined transmembrane topology and signal peptide prediction method, J. mol. biol., 338, 1027-1036, (2004)
[50] Kall, L.; Krogh, A.; Sonnhammer, E.L.L., An HMM posterior decoder for sequence feature prediction that includes homology information, Bioinformatics, 21, 251-257, (2005)
[51] Kall, L.; Krogh, A.; Sonnhammer, E.L.L., Advantages of combined transmembrane topology and signal peptide prediction—the phobius web server, Nucleic acids res., 35, 429-432, (2007)
[52] Kampenusa, I.; Zikmanis, P., Distinctive attributes for predicted secondary structures at terminal sequences of non-classically secreted proteins from proteobacteria, Cent. eur. J. biol., 3, 320-326, (2008)
[53] Konkel, M.E.; Kim, B.J.; Rivera-Amill, V.; Garvis, S.G., Bacterial secreted proteins are required for the internalization of campylobacter jejuni into cultured Mammalian cells, Mol. microbiol., 32, 691-701, (1999)
[54] Kostakioti, M.; Newman, C.L.; Thanassi, D.G.; Stathopoulos, C., Mechanisms of protein export across the bacterial outer membrane, J. bacteriol., 187, 4306-4314, (2005)
[55] Krogh, A.; Larsson, B.; von Heijne, G.; Sonnhammer, E.L.L., Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes, J. mol. biol., 305, 567-580, (2001)
[56] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using chou’s pseudo amino acid composition, J. theor. biol., 252, 350-356, (2008) · Zbl 1398.92076
[57] Lin, W.Z.; Xiao, X.; Chou, K.C., GPCR-GIA: a web-server for identifying G-protein coupled receptors and their families with grey incidence analysis, Protein eng. des. sel., 22, 699-705, (2009)
[58] Lory, S., Secretion of proteins and assembly of bacterial surface organelles: shared pathways of extracellular protein targeting, Curr. opinion microbiol., 1, 27-35, (1998)
[59] Lu, Z.; Szafron, D.; Greiner, R.; Lu, P.; Wishart, D.S.; Poulin, B.; Anvik, J.; Macdonell, C.; Eisner, R., Predicting subcellular localization of proteins using machine-learned classifiers, Bioinformatics, 20, 547-556, (2004)
[60] Mashburn-Warren, L.M.; Whiteley, M., Special delivery: vesicle trafficking in prokaryotes, Mol. microbiol., 61, 839-846, (2006)
[61] Matthews, B.W., Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochim. biophys. acta, 405, 442-451, (1975)
[62] Nair, R.; Rost, B., Mimicking cellular sorting improves prediction of subcellular localization, J. mol. biol., 348, 85-100, (2005)
[63] Nakai, K.; Horton, P., PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization, Trends biochem. sci., 24, 34-36, (1999)
[64] Nakai, K.; Kanehisa, M., Expert system for predicting protein localization sites in Gram-negative bacteria, Proteins, 11, 95-110, (1991)
[65] Nielsen, H.; Brunak, S.; von Heijne, G., Machine learning approaches for the prediction of signal peptides and other protein sorting signals, Protein eng., 12, 3-9, (1999)
[66] Nielsen, H.; Engelbrecht, J.; Brunak, S.; von Heijne, G., A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites, Int. J. neural syst., 8, 581-599, (1997)
[67] Peabody, C.R.; Chung, Y.J.; Yen, M.R.; Vidal-Ingigliardi, D.; Pugsley, A.P.; Saier, M.H., Type II protein secretion and its relationship to bacterial type IV pili and archaeal flagella, Microbiology, 149, 3051-3072, (2003)
[68] Prilusky, J.; Felder, C.E.; Zeev-Ben-Mordehai, T.; Rydberg, E.H.; Man, O.; Beckmann, J.S.; Silman, I.; Sussman, J.L., Foldindex: a simple tool to predict whether a given protein sequence is intrinsically unfolded, Bioinformatics, 21, 3435-3438, (2005)
[69] Rakonjac, J.; Russel, M., Bacterial systems for assembly, secretion and targeted translocation of proteins and protein/DNA complexes, ASBMB aust. biochem., 34, 7-10, (2003)
[70] Shen, H.B.; Chou, K.C., Signal-3L: a 3-layer approach for predicting signal peptides, Biochem. biophys. res. commun., 363, 297-303, (2007)
[71] Smialowski, P.; Martin-Galiano, A.J.; Mikolajka, A.; Girschick, T.; Holak, T.A.; Frishman, D., Protein solubility: sequence based prediction and experimental verification, Bioinformatics, 23, 2536-2542, (2007)
[72] Stephens, C.; Shapiro, L., Bacterial protein secretion—a target for new antibiotics?, Chem. biol., 4, 637-641, (1997)
[73] Tanford, C., Contribution of hydrophobic interactions to the stability of the globular conformation of proteins, J. am. chem. soc., 84, 4240-4247, (1962)
[74] Tjalsma, H.; Bolhuis, A.; Jongbloed, J.D.H.; Bron, S.; van Dijl, J.M., Signal peptide-dependent protein transport in bacillus subtilis: a genome-based survey of the secretome, Microbiol. mol. biol. rev., 64, 515-547, (2000)
[75] Tseng, T.T.; Tyler, B.M.; Setubal, J.C., Protein secretion systems in bacterial – host associations, and their description in the gene ontology, BMC microbiol., 9, Suppl. 1, S2, (2009)
[76] Vapnik, V., Statistical learning theory, (1998), Wiley New York · Zbl 0935.62007
[77] Wold, S.; Jonsson, J.; Sjostrom, M.; Sandberg, M.; S. Rannar, DNA, Peptide sequences and chemical processes mutlivariately modelled by principal component analysis and partial least-squares projections to latent structures, Anal. chim. acta, 277, 239-253, (1993)
[78] Xiao, X.; Lin, W.Z., Application of protein grey incidence degree measure to predict protein quaternary structural types, Amino acids, 37, 741-749, (2009)
[79] Xiao, X.; Lin, W.Z.; Chou, K.C., Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes, J. comput. chem., 29, 2018-2024, (2008)
[80] Xiao, X.; Shao, S.H.; Ding, Y.S.; Huang, Z.D.; Chou, K.C., Using cellular automata images and pseudo amino acid composition to predict protein subcellular location, Amino acids, 30, 49-54, (2006)
[81] Xiao, X.; Shao, S.H.; Ding, Y.S.; Huang, Z.D.; Huang, Y.S.; Chou, K.C., Using complexity measure factor to predict protein subcellular location, Amino acids, 28, 57-61, (2005)
[82] Xiao, X.; Shao, S.H.; Huang, Z.D.; Chou, K.C., Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor, J. comput. chem., 27, 478-482, (2006)
[83] Xiao, X.; Wang, P.; Chou, K.C., Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image, J. theor. biol., 254, 691-696, (2008) · Zbl 1400.92416
[84] Xiao, X.; Wang, P.; Chou, K.C., Predicting protein quaternary structural attribute by hybridizing functional domain composition and pseudo amino acid composition, J. appl. crystallogr., 42, 169-173, (2009)
[85] Xiao, X.; Wang, P.; Chou, K.C., GPCR-CA: a cellular automaton image approach for predicting G-protein-coupled receptor functional classes, J. comput. chem., 30, 1414-1423, (2009)
[86] Yu, C.S.; Chen, Y.C.; Lu, C.H.; Hwang, J.K., Prediction of protein subcellular localization, Proteins, 64, 643-651, (2006)
[87] Yu, C.S.; Lin, C.J.; Hwang, J.K., Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions, Protein sci., 13, 1402-1406, (2004)
[88] Yu, L.Z.; Guo, Y.Z.; Zhang, Z.; Li, Y.Z.; Li, M.L.; Li, G.B.; Xiong, W.J.; Zeng, Y.H., Secretp: a new method for predicting Mammalian secreted proteins, Peptides, 31, 574-578, (2010)
[89] Zeng, Y.H.; Guo, Y.Z.; Xiao, R.Q.; Yang, L.; Yu, L.Z.; Li, M.L., Using the augmented chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach, J. theor. biol., 259, 366-372, (2009) · Zbl 1402.92193
[90] Zhou, G.P.; Doctor, K., Subcellular location prediction of apoptosis proteins, Proteins, 50, 44-48, (2003)
[91] Zhou, X.B.; Chen, C.; Li, Z.C.; Zou, X.Y., Using chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. theor. biol., 248, 546-551, (2007)
[92] Zimmerman, J.M.; Eliezer, N.; Simha, R., The characterization of amino acid sequences in proteins by statistical methods, J. theor. biol., 21, 170-201, (1968)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.