×

zbMATH — the first resource for mathematics

Predicting mycobacterial proteins subcellular locations by incorporating pseudo-average chemical shift into the general form of Chou’s pseudo amino acid composition. (English) Zbl 1397.92186
Summary: Mycobacterium tuberculosis (MTB) is a pathogenic bacterial species in the genus Mycobacterium and the causative agent of most cases of tuberculosis [H. M. Berman et al., “The protein data bank”, Nucleic Acids Res. 28, 235–242 (2000)]. Knowledge of the localization of Mycobacterial protein may help unravel the normal function of this protein. Automated prediction of Mycobacterial protein subcellular localization is an important tool for genome annotation and drug discovery. In this work, a benchmark data set with 638 non-redundant mycobacterial proteins is constructed and an approach for predicting Mycobacterium subcellular localization is proposed by combining amino acid composition, dipeptide composition, reduced physicochemical property, evolutionary information, pseudo-average chemical shift. The overall prediction accuracy is 87.77% for Mycobacterial subcellular localizations and 85.03% for three membrane protein types in integral membranes using the algorithm of increment of diversity combined with support vector machine. The performance of pseudo-average chemical shift is excellent. In order to check the performance of our method, the data set constructed by Rashid was also predicted and the accuracy of 98.12% was obtained. This indicates that our approach was better than other existing methods in literature.

MSC:
92C40 Biochemistry, molecular biology
92C37 Cell biology
68T05 Learning and adaptive systems in artificial intelligence
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Andrade, M.A.; O’Donoghue, S.I.; Rost, B., Adaption of protein surface to subcellular location, J. mol. biol., 517-525, (1998)
[2] Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E., The protein data bank, Nucleic acids res., 28, 235-242, (2000)
[3] Bi, J.; Yang, H.; Yan, H.; Song, R.; Fan, J., Knowledge-based virtual screening of HLA-A⁎0201-restricted CD8+ T-cell epitope peptides from herpes simplex virus genome, J. theor biol., 281, 133-139, (2011) · Zbl 1397.92181
[4] Cai, Y.D.; Zhou, G.P.; Chou, K.C., Support vector machines for predicting membrane protein types by using functional domain composition, Biophys. J., 84, 3257-3263, (2003)
[5] Cai, Y.D.; Lin, S.L.; Chou, K.C., Support vector machines for prediction of protein signal sequences and their cleavage sites, Peptides, 24, 159-161, (2003)
[6] Cai, Y.D.; Liu, X.J.; Xu, X.B.; Chou, K.C., Support vector machines for the classification and prediction of beta-turn types, J. pept. sci., 8, 297-301, (2002)
[7] Cai, Y.D.; Liu, X.J.; Xu, X.B.; Chou, K.C., Support vector machines for predicting HIV protease cleavage sites in protein, J. comput. chem., 23, 267-274, (2002)
[8] Cai, Y.D.; Liu, X.J.; Xu, X.B.; Chou, K.C., Support vector machines for predicting the specificity of galnac-transferase, Peptides, 23, 205-208, (2002)
[9] Cai, Y.D.; Liu, X.J.; Xu, X.B.; Chou, K.C., Prediction of protein structural classes by support vector machines, Comput. chem., 26, 293-296, (2002)
[10] Cai, Y.D.; Feng, K.Y.; Li, Y.X.; Chou, K.C., Support vector machine for predicting alpha-turn types, Peptides, 24, 629-630, (2003)
[11] Cai, Y.D.; Ricardo, P.W.; Jen, C.H.; Chou, K.C., Application of SVM to predict membrane protein types, J. theor. biol., 226, 373-376, (2004)
[12] Cai, Y.D.; Zhou, G.P.; Jen, C.H.; Lin, S.L.; Chou, K.C., Identify catalytic triads of serine hydrolases by support vector machines, J. theor. biol., 228, 551-557, (2004)
[13] Cegielski, J.P.; Chin, D.P.; Espinal, M.A.; Frieden, T.R.; Rodriquez Cruz, R.; Talbot, E.A.; Weil, D.E.; Zaleskis, R.; Raviglione, M.C., The global tuberculosis situation. progress and problems in the 20th century, prospects for the 21st century, Infect dis. clin. north am., 16, 1-58, (2002)
[14] Chang, C.C.; Lin, C.J., LIBSVM: a library for support vector machines, ACM trans. intell. syst. technol., 2, (2011), 27:1–27:27
[15] Chen, C.; Chen, L.; Zou, X.; Cai, P., Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine, Protein pept. lett., 16, 27-31, (2009)
[16] Chen, Y.L.; Li, Q.Z., Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition, J. theor. biol., 248, 377-381, (2007)
[17] Chothia, C.; Lesk, A.M., The relation between the divergence of sequence and structure in proteins, Embo j., 5, 823-826, (1986)
[18] Chou, K.C., Using pair-coupled amino acid composition to predict protein secondary structure content, J. protein chem., 18, 473-480, (1999)
[19] Chou, K.C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43, 246-255, (2001)
[20] Chou, K.C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. theor. biol., 273, 236-247, (2011) · Zbl 1405.92212
[21] Chou, K.C.; Zhang, C.T., Prediction of protein structural classes, Crit rev. biochem. mol. biol., 30, 275-349, (1995)
[22] Chou, K.C.; Elrod, D.W., Protein subcellular location prediction, Protein eng., 12, 107-118, (1999)
[23] Chou, K.C.; Cai, Y.D., Using functional domain composition and support vector machines for prediction of protein subcellular location, J. biol. chem., 277, 45765-45769, (2002)
[24] Chou, K.C.; Shen, H.B., Recent progress in protein subcellular location prediction, Anal. biochem., 370, 1-16, (2007)
[25] Chou, K.C.; Shen, H.B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochem. biophys. res. commun., 360, 339-345, (2007)
[26] Chou, K.C.; Shen, H.B., Cell-ploc: a package of web servers for predicting subcellular localization of proteins in various organisms, Nat. protoc., 3, 153-162, (2008)
[27] Chou, K.C., and Shen, H.B., 2009. Review: recent advances in developing web-servers for predicting protein attributes. Natural science 2, 63-92(openly accessible at http://www.scirp.org/journal/NS/).
[28] Chou, K.C.; Shen, H.B., A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: euk-mploc 2.0, Plos one, 5, e9931, (2010)
[29] Chou, K.C.; Shen, H.B., Cell-ploc2.: a improved package of web servers for predicting subcellular localization of proteins in various organisms, Nat. sci., 2, 1090-1103, (2010)
[30] Chou, K.C.; Wu, Z.C.; Xiao, X., Iloc-euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins, Plos one, 6, e18258, (2011)
[31] Chou, K.C., Wu, Z.C., and Xiao, X., 2011b. iLoc-Hum: using accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. Mol. Biosyst. 10.1039/C1MB05420a.
[32] Dickerson, R.E.; Timkovich, R.; Almassy, R.J., The cytochrome fold and the evolution of bacterial energy metabolism, J. mol. biol., 100, 473-491, (1976)
[33] Ding, H.; Luo, L.; Lin, H., Prediction of cell wall lytic enzymes using Chou’s amphiphilic pseudo amino acid composition, Protein pept. lett., 16, 351-355, (2009)
[34] Esmaeili, M.; Mohabatkar, H.; Mohsenzadeh, S., Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses, J. theor. biol., 263, 203-209, (2010) · Zbl 1406.92455
[35] Fan, G.L., and Li, Q.Z., 2011. Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition. Amino Acids. 10.1007/s00726-011-1143-4.
[36] Feng, Z.P., An overview on predicting the subcellular location of a protein, In silicon biol., 2, 291-303, (2002)
[37] Frieden, T.R.; Sterling, T.R.; Munsiff, S.S.; Watt, C.J.; Dye, C., Tuberculosis, Lancet, 362, 887-899, (2003)
[38] Gao, Q.B.; Wang, Z.Z.; Yan, C.; Du, Y.H., Prediction of protein subcellular location using a combined feature of sequence, FEBS lett., 579, 3444-3448, (2005)
[39] Georgiou, D.N.; Karakasidis, T.E.; Nieto, J.J.; Torres, A., Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou’s pseudo amino acid composition, J. theor. biol., 257, 17-26, (2009) · Zbl 1400.92393
[40] Gu, Q.; Ding, Y.S.; Zhang, T.L., Prediction of G-protein-coupled receptor classes in low homology using Chou’s pseudo amino acid composition with approximate entropy and hydrophobicity patterns, Protein pept. lett., 17, 559-567, (2010)
[41] Hayat, M.; Khan, A., Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition, J. theor. biol., 271, 10-17, (2011) · Zbl 1405.92217
[42] Idicula-Thomas, S.; Kulkarni, A.J.; Kulkarni, B.D.; Jayaraman, V.K.; Balaji, P.V., A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli, Bioinformatics, 22, 278-284, (2006)
[43] Jiang, X.; Wei, R.; Zhang, T.L.; Gu, Q., Using the concept of Chou’s pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy, Protein pept. lett., 15, 392-396, (2008)
[44] Jones, D.T., Protein secondary structure prediction based on position-specific scoring matrices, J. mol. biol., 292, 195-202, (1999)
[45] Kandaswamy, K.K.; Chou, K.C.; Martinetz, T.; Moller, S.; Suganthan, P.N.; Sridharan, S.; Pugalenthi, G., AFP-pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties, J. theor. biol., 270, 56-62, (2011)
[46] Kaur, H.; Raghava, G.P., Prediction of alpha-turns in proteins using PSI-BLAST profiles and secondary structure information, Proteins, 55, 83-90, (2004)
[47] Li, F.M.; Li, Q.Z., Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach, Amino acids, 34, 119-125, (2008)
[48] Li, F.M.; Li, Q.Z., Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach, Protein pept. lett., 15, 612-616, (2008)
[49] Li, Q.Z.; Lu, Z.Q., The prediction of the structural class of protein: application of the measure of diversity, J. theor. biol., 213, 493-502, (2001)
[50] Li, W.; Jaroszewski, L.; Godzik, A., Clustering of highly homologous sequences to reduce the size of large protein databases, Bioinformatics, 17, 282-283, (2001)
[51] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition, J. theor. biol., 252, 350-356, (2008) · Zbl 1398.92076
[52] Lin, H.; Ding, H., Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition, J. theor. biol., 269, 64-69, (2011) · Zbl 1307.92080
[53] Lin, H.; Ding, H.; Guo, F.B.; Huang, J., Prediction of subcellular location of mycobacterial protein using feature selection techniques, Mol. diversity, 14, 667-671, (2010)
[54] Lin, H.; Ding, H.; Guo, F.B.; Zhang, A.Y.; Huang, J., Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition, Protein pept. lett., 15, 739-744, (2008)
[55] Liu, T.; Zheng, X.; Wang, C.; Wang, J., Prediction of subcellular location of apoptosis proteins using pseudo amino acid composition: an approach from auto covariance transformation, Protein pept. lett., 17, 1263-1269, (2010)
[56] Luginbuhl, P.; Szyperski, T.; Wuthrich, K., Statistical basis for the use of ^13C a chemical shifts in protein structure determination, J. magn. reson. B, 109, 229-233, (1995)
[57] Matthews, B.W., Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochim. biophys. acta, 405, 442-451, (1975)
[58] Mielke, S.P.; Krishnan, V.V., Protein structural class identification directly from NMR spectra using averaged chemical shifts, Bioinformatics, 19, 2054-2064, (2003)
[59] Mohabatkar, H., Prediction of cyclin proteins using Chou’s pseudo amino acid composition, Protein pept. lett., 17, 1207-1214, (2010)
[60] Mohabatkar, H.; Mohammad Beigi, M.; Esmaeili, A., Prediction of GABA(A) receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machine, J. theor. biol., 281, 18-23, (2011) · Zbl 1397.92215
[61] Nakai, K., Protein sorting signals and prediction of subcellular localization, Adv. protein chem., 54, 277-344, (2000)
[62] Nakashima, H.; Nishikawa, K., Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies, J. mol. biol., 238, 54-61, (1994)
[63] Pastore, A.; Lesk, A.M., Comparison of the structures of globins and phycocyanins: evidence for evolutionary relationship, Proteins, 8, 133-155, (1990)
[64] Pollastri, G.; McLysaght, A., Porter: a new, accurate server for protein secondary structure prediction, Bioinformatics, 21, 1719-1720, (2005)
[65] Pollastri, G.; Martin, A.J.; Mooney, C.; Vullo, A., Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information, BMC bioinf., 8, 201, (2007)
[66] Qiu, J.D.; Huang, J.H.; Shi, S.P.; Liang, R.P., Using the concept of Chou’s pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform, Protein pept. lett., 17, 715-722, (2010)
[67] Rashid, M.; Saha, S.; Raghava, G.P., Support vector machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs, BMC bioinf., 8, 337, (2007)
[68] Schaffer, A.A.; Aravind, L.; Madden, T.L.; Shavirin, S.; Spouge, J.L.; Wolf, Y.I.; Koonin, E.V.; Altschul, S.F., Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements, Nucleic acids res., 29, 2994-3005, (2001)
[69] Scharfe, C.; Zaccaria, P.; Hoertnagel, K.; Jaksch, M.; Klopstock, T.; Dembowski, M.; Lill, R.; Prokisch, H.; Gerbitz, K.D.; Neupert, W.; Mewes, H.W.; Meitinger, T., MITOP, the mitochondrial proteome database: 2000 update, Nucleic acids res., 28, 155-158, (2000)
[70] Sibley, A.B.; Cosman, M.; Krishnan, V.V., An empirical correlation between secondary structure content and averaged chemical shifts in proteins, Biophys J, 84, 1223-1227, (2003)
[71] Singh, V.; Somvanshi, P., Toward the virtual screening of potential drugs in the homology modeled NAD+ dependent DNA ligase from mycobacterium tuberculosis, Protein pept. lett., 17, 269-276, (2010)
[72] Spera, S.; Bax, A., Empirical correlation between protein backbone conformation and Cα and Cβ13C nuclear magnetic resonance chemical shifts, J. am. chem. soc., 113, 5490-5492, (1995)
[73] Vapnik, V., Statistical learning theory, (1998), Wiley-interscience New York · Zbl 0935.62007
[74] Wishart, D.S.; Sykes, B.D.; Richards, F.M., Relationship between nuclear magnetic resonance chemical shift and protein secondary structure, J. mol. biol., 222, 311-333, (1991)
[75] Wu, C.H.; Apweiler, R.; Bairoch, A.; Natale, D.A.; Barker, W.C.; Boeckmann, B.; Ferro, S.; Gasteiger, E.; Huang, H.; Lopez, R.; Magrane, M.; Martin, M.J.; Mazumder, R.; O’Donovan, C.; Redaschi, N.; Suzek, B., The universal protein resource (uniprot): an expanding universe of protein information, Nucleic acids res., 34, D187-D191, (2006)
[76] Wu, Z.C.; Xiao, X.; Chou, K.C., Iloc-plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites, Mol. biosyst., 7, 3287-3297, (2011)
[77] Wu, Z.C.; Xiao, X.; Chou, K.C., Iloc-gpos: A multi-layer classifier for predicting the subcellular localization of singleplex and multiplex Gram-positive bacterial proteins, Protein pept. lett., 19, 4-14, (2012)
[78] Xiao, X.; Wu, Z.C.; Chou, K.C., A multi-label classifier for predicting the subcellular localization of Gram-negative bacterial proteins with both single and multiple sites, Plos one, 6, e20592, (2011)
[79] Xiao, X.; Wu, Z.C.; Chou, K.C., Iloc-virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites, J. theor. biol., 284, 42-51, (2011) · Zbl 1397.92238
[80] Yu, L.; Guo, Y.; Li, Y.; Li, G.; Li, M.; Luo, J.; Xiong, W.; Qin, W., Secretp: identifying bacterial secreted proteins by fusing new features into Chou’s pseudo-amino acid composition, J. theor. biol., 267, 1-6, (2010) · Zbl 1410.92040
[81] Zakeri, P.; Moshiri, B.; Sadeghi, M., Prediction of protein submitochondria locations based on data fusion of various features of sequences, J. theor. biol., 269, 208-216, (2011) · Zbl 1307.92094
[82] Zeng, Y.H.; Guo, Y.Z.; Xiao, R.Q.; Yang, L.; Yu, L.Z.; Li, M.L., Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach, J. theor. biol., 259, 366-372, (2009) · Zbl 1402.92193
[83] Zhang, G.Y.; Fang, B.S., Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou’s amphiphilic pseudo-amino acid composition, J. theor. biol., 253, 310-315, (2008)
[84] Zhang, G.Y.; Li, H.C.; Gao, J.Q.; Fang, B.S., Predicting lipase types by improved Chou’s pseudo-amino acid composition, Protein pept. lett., 15, 1132-1137, (2008)
[85] Zhao, Y.; Alipanahi, B.; Li, S.C.; Li, M., Protein secondary structure prediction using NMR chemical shift data, J. bioinf. comput. biol., 8, 867-884, (2010)
[86] Zhou, G.P.; Assa-Munt, N., Some insights into protein structural class prediction, Proteins, 44, 57-59, (2001)
[87] Zhou, G.P.; Doctor, K., Subcellular location prediction of apoptosis proteins, Proteins, 50, 44-48, (2003)
[88] Zhou, X.B.; Chen, C.; Li, Z.C.; Zou, X.Y., Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. theor. biol., 248, 546-551, (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.