×

Predicting apoptosis protein subcellular localization by integrating auto-cross correlation and PSSM into Chou’s PseAAC. (English) Zbl 1406.92230

Summary: The prediction of subcellular localization of an apoptosis protein is still a challenging task, and existing methods mainly based on protein primary sequences. In this study, we propose a novel model called MACC-PSSM by integrating Moran autocorrelation and cross correlation with PSSM. Then a 3600-dimensional feature vector is constructed to predict apoptosis protein subcellular localization. Finally, 210 features are selected using principal component analysis (PCA) on the ZW225 dataset, and support vector machine is adopted as classifier. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark datasets: ZW225 and CL317. Our model achieves competitive performance on prediction accuracies, especially for the overall prediction accuracies for datasets ZW225 and CL317, which reach 84.9% and 90.5%, respectively. Comparison of our results with other methods demonstrates that MACC-PSSM model can be used as a potential candidate for the accurate prediction of apoptosis protein subcellular localization.

MSC:

92C40 Biochemistry, molecular biology
92C37 Cell biology
92D20 Protein sequences, DNA sequences
68T05 Learning and adaptive systems in artificial intelligence
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Ahmad, K.; Waris, M.; Hayat, M., Prediction of protein submitochondrial locations by incorporating dipeptide composition into chou’s general pseudo amino acid composition, J. Membr. Biol., 249, 293-304, (2016)
[2] Altschul, S. F.; Madden, T. L.; Schaffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res., 25, 3389-3402, (1997)
[3] Bulashevska, A.; Eils, R., Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains, BMC Bioinformatics, 7, 298, (2006)
[4] Chang, C.C., Lin, C.J., 2001. LIBSVM: a library for support vector machines.
[5] Chen, W., ISS-psednc: identifying splicing sites using pseudo dinucleotide composition, Biomed. Res. Int. (BMRI), 2014, 623149, (2014)
[6] Chen, W., IACP: a sequence-based tool for identifying anticancer peptides, Oncotarget, 7, 13, 16895-16909, (2016)
[7] Chen, W., IRNA-3typea: identifying 3-types of modification at RNA’s adenosine sites, Mole. Ther. Nucleic Acid, 11, 468-474, (2018)
[8] Chen, W.; Ding, H.; Feng, P.; Lin, H., IACP: a sequence-based tool for identifying anticancer peptides, Oncotarget, 7, 16895-16909, (2016)
[9] Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K. C., IRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences, Oncotarget, 8, 4208-4217, (2017)
[10] Chen, W.; Feng, P. M.; Lin, H., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, e68, (2013)
[11] Chen, W.; Feng, P. M.; Lin, H.; Chou, K. C., ISS-psednc: identifying splicing sites using pseudo dinucleotide composition, Biomed. Res. Int. (BMRI), 2014, 623149, (2014)
[12] Chen, W.; Lei, T. Y.; Jin, D. C.; Lin, H., Pseknc: a flexible web-server for generating pseudo k-tuple nucleotide composition, Anal. Biochem., 456, 53-60, (2014)
[13] Chen, W.; Lin, H., Pseudo nucleotide composition or pseknc: an effective formulation for analyzing genomic sequences, Mol. Biosyst., 11, 2620-2634, (2015)
[14] Chen, Y. L.; Li, Q. Z., Prediction of the subcellular location apoptosis proteins using the algorithm of measure of diversity, Acta Sci. Nat. Univ., 25, 413-417, (2004)
[15] Chen, Y. L.; Li, Q. Z., Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition, J. Theor. Biol., 248, 377-381, (2007)
[16] Chen, Y. L.; Li, Q. Z., Prediction of the subcellular location of apoptosis proteins, J. Theor. Biol., 245, 775-783, (2007)
[17] Cheng, X., Ploc-manimal: predict subcellular localization of animal proteins with both single and multiple sites, Bioinformatics, 33, 22, 3524-3531, (2017)
[18] Cheng, X., Ploc-mplant: predict subcellular localization of multi-location plant proteins via incorporating the optimal GO information into general pseaac, Mol. Biosyst., 13, 9, 1722-1727, (2017)
[19] Cheng, X., Ploc-mvirus: predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general pseaac, Gene, 628, 315-321, (2017)
[20] Cheng, X., Ploc-meuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general pseaac, Genomics, 110, 1, 50-58, (2018)
[21] Cheng, X., Ploc-mgneg: predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general pseaa c, Genomics, 110, 4, 231-239, (2018)
[22] Cheng, X., Ploc-mhum: predict subcellular localization of multi-location human proteins via general pseaac to winnow out the crucial GO information, Bioinformatics, 34, 9, 1448-1456, (2018)
[23] Cheng, X., et al., 2018d. pLoc_bal-manimal: predict subcellular localization of animal proteins by balancing training dataset and pseAAC. Bioinformatics. 10.1093/bioinformatics/bty628
[24] Cheng, X.; Xiao, X., Ploc-meuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general pseaac, Genomics, 110, 50-58, (2018)
[25] Cheng, X.; Xiao, X., Ploc-mgneg: predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general pseaac, Genomics, 110, 231-239, (2018)
[26] Cheng, X.; Zhao, S. G., IATC-misf: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics, 33, 341-346, (2017)
[27] Chou, K. C., Prediction of protein cellular attributes using pseudo amino acid composition, PROTEINS Struct. Funct. Genet., 43, 246-255, (2001)
[28] Chou, K. C., Prediction of signal peptides using scaled window, Peptides, 22, 1973-1979, (2001)
[29] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[30] Chou, K. C., Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology, Curr. Proteom., 6, 262-274, (2009)
[31] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year review), J. Theor. Biol., 273, 1, 236-247, (2011) · Zbl 1405.92212
[32] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year review), J. Theor. Biol., 273, 236-247, (2011) · Zbl 1405.92212
[33] Chou, K. C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 1092-1100, (2013)
[34] Chou, K. C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 218-234, (2015)
[35] Chou, K. C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 3, 218-234, (2015)
[36] Chou, K. C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top. Med. Chem., 17, 2337-2358, (2017)
[37] Chou, K. C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top. Med. Chem., 17, 21, 2337-2358, (2017)
[38] Chou, K. C.; Shen, H. B., Review: recent progress in protein subcellular location prediction, Anal. Biochem., 370, 1-16, (2007)
[39] Chou, K. C.; Wu, Z. C.; Xiao, X., Iloc-euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins, PLoS ONE, 6, e18258, (2011)
[40] Dehzangi, A.; Heffernan, R.; Sharma, A.; Lyons, J.; Paliwal, K.; Sattar, A., Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into chou’s general pseaac, J. Theo.r Biol., 364, 284-294, (2015) · Zbl 1405.92092
[41] Ding, Y. S.; Zhang, T. L., Using chou’s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier, Pattern Recognit. Lett., 29, 1887-1892, (2008)
[42] Feng, P., et al., 2018. iDNA6ma-pseKNC: Identifying DNA n6-methyladenosine sites by incorporating nucleotide physicochemical properties into pseKNC. Genomics. 10.1016/j.ygeno.2018.01.005,
[43] Feng, P.; Ding, H.; Yang, H.; Chen, W.; Lin, H.; Chou, K. C., IRNA-psecoll: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into pseknc, Mol. Ther. Nucleic Acids, 7, 155-163, (2017)
[44] Feng, P., Yang, H., Ding, H., Lin, H., 2018. iDNA6ma-PseKNC: identifying DNA n6-methyladenosine sites by incorporating nucleotide physicochemical properties into pseKNC. Genomics. 10.1016/j.ygeno.2018.01.005
[45] Feng, P. M., IHSP-pseraaac: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition, Anal. Biochem., 442, 1, 118-125, (2013)
[46] Hayat, M.; Iqbal, N., Discriminating protein structure classes by incorporating pseudo average chemical shift to chou’s general pseaac and support vector machine, Comput. Methods Programs Biomed., 116, 184-192, (2014)
[47] Huang, Y.; Li, Y., Prediction of protein subcellular locations using fuzzy k-NN method, Bioinformatics, 20, 121-128, (2004)
[48] Jia, J.; Liu, Z.; Xiao, X.; Liu, B., Psuc-lys: predict lysine succinylation sites in proteins with pseaac and ensemble random forest approach, J. Theor. Biol., 394, 223-230, (2016) · Zbl 1343.92153
[49] Jian, X. Y.; Wei, R.; Zhan, T. L.; Gu, Q., Using the concept of chou’s pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy, Protein Pept. Lett., 15, 392-396, (2008)
[50] Ju, Z.; Wang, S. Y., Prediction of citrullination sites by incorporating k-spaced amino acid pairs into chou’s general pseudo amino acid composition, Gene, 664, 78-83, (2018)
[51] Liang, Y. Y.; Liu, S. Y.; Zhang, S. L., Geary autocorrelation and DCCA coefficient: application to predict apoptosis protein subcellular localization via PSSM, Physica A, 467, 296-306, (2017)
[52] Lin, H.; Deng, E. Z.; Ding, H., Ipro54-pseknc: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition, Nucleic Acids Res., 42, 12961-12972, (2014)
[53] Lin, H.; Wang, H.; Ding, H.; Chen, Y. L.; Li, Q. Z., Prediction of subcellular localization of apoptosis protein using chou’s pseudo amino acid composition, Acta Biotheor., 57, 321-330, (2009)
[54] Liu, B., Identification of real microrna precursors with a pseudo structure status composition approach, PLoS ONE, 10, e0121501, (2015)
[55] Liu, B., et al., 2018a. ienhancer-EL: identifying enhancers and their strength with ensemble learning approach. Bioinformatics. 10.1093/bioinformatics/bty458
[56] Liu, B., Ipromoter-2l: a two-layer predictor for identifying promoters and their types by multi-window-based pseknc, Bioinformatics, 34, 1, 33-40, (2018)
[57] Liu, B., et al., 2018c. iRO-3wpseKNC: identify DNA replication origins by three-window-based pseKNC. Bioinformatics. 10.1093/bioinformatics/bty312/4978052
[58] Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L., Pse-in-one: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res., 43, W65-W71, (2015)
[59] Liu, B.; Wang, S.; Long, R., Irspot-EL: identify recombination spots with an ensemble learning approach, Bioinformatics, 33, 35-41, (2017)
[60] Liu, B.; Wu, H., Pse-in-one 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nat. Sci. (Irvine), 9, 67-91, (2017)
[61] Liu, B.; Yang, F., 2L-pirna: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function, Mol. Ther. Nucleic Acids, 7, 267-277, (2017)
[62] Liu, B.; Yang, F.; Huang, D. S., Ipromoter-2l: a two-layer predictor for identifying promoters and their types by multi-window-based pseknc, Bioinformatics, 34, 33-40, (2018)
[63] Liu, T.; Zheng, X.; Wang, C.; Wang, J., Prediction of subcellular location of apoptosis proteins using pseudo amino acid composition: an approach from auto covariance transformation, Protein Pept. Lett., 17, 1263-1269, (2010)
[64] Liu, T. G.; Geng, X. B.; Zheng, X. Q., Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles, Amino Acids, 42, 2243-2249, (2012)
[65] Meher, P. K.; Sahu, T. K.; Saini, V.; Rao, A. R., Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into chou’s general pseaac, Sci. Rep., 7, 42362, (2017)
[66] Moran, P. A., Notes on continuous stochastic phenomena, Biometrika, 37, 17-23, (1950) · Zbl 0041.45702
[67] Pacharawongsakda, E.; Theeramunkong, T., Predict subcellular locations of singleplex and multiplex proteins by semi-supervised learning and dimension-reducing general mode of chou’s pseaac, IEEE Trans. Nanobioscience, 12, 311-320, (2013)
[68] Qiu, J. D.; Luo, S. H.; Huang, J. H.; Sun, X. Y.; Liang, R. P., Predicting subcellular location of apoptosis proteins based on wavelet transform and support vector machine, Amino Acids, 38, 1201-1208, (2010)
[69] Qiu, W. R.; Sun, B. Q.; Xiao, X.; Xu, Z. C., IPTM-mlys: identifying multiple lysine PTM sites and their different types, Bioinformatics, 32, 3116-3123, (2016)
[70] Saravanan, V.; Lakshmi, P. T.V., APSLAP: an adaptive boosting technique for predicting subcellular localization of apoptosis protein, Acta Biotheor., 61, 481-497, (2013)
[71] Shen, H. B., Recent advances in developing web-servers for predicting protein attributes, Nat. Sci. (Irvine), 1, 63-92, (2009)
[72] Su, Z. D., et al., 2018. iloc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general pseKNC. Bioinformatics. 10.1093/bioinformatics/bty508
[73] Suzuki, M.; Youle, R. J.; Tjandra, N., Structure of bax: coregulation of dimmer formation and intracellular location, Cell, 103, 645-654, (2000)
[74] Wang, G. L.; Dunbrack, R. L., PISCES: a protein sequence culling server, Bioinformatics, 19, 1589-1591, (2003)
[75] Xiao, X., Ploc-mgpos: incorporate key gene ontology information into general pseaac for predicting subcellular localization of Gram-positive bacterial proteins, Nat. Sci. (Irvine), 9, 10, 331-349, (2017)
[76] Xiao, X., Ploc_bal-mgpos: predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and pseaac, Genomics, (2018)
[77] Xu, Y.; Wen, X.; Wen, L. S.; Wu, L. Y., Initro-tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition, PLoS ONE, 9, e105018, (2014)
[78] Xuan, X., Cheng, X., Chen, G., Mao, Q., 2018. pLoc_bal-mGpos: predict subcellular localization of gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC. Genomics. 10.1016/j.ygeno.2018.05.017
[79] Yang, H., Irspot-pse6NC: identifying recombination spots in saccharomyces cerevisiae by incorporating hexamer composition into general pseknc, Int. J. Biol. Sci., 14, 883-891, (2018)
[80] Yao, Y. H.; Shi, Z. X.; Dai, Q., Apoptosis protein subcellular location prediction based on position-specific scoring matrix, J Comput Theor Nanosci, 10, 2073-2078, (2014)
[81] Yao, Y. H.; Xu, H. M.; He, P. A.; Dai, Q., Recent advances on prediction of protein subcellular localization, Mini Rev. Org. Chem., 12, 481-492, (2015)
[82] Yuan, Z.; Bailey, T. L.; Teasdak, R. D., Prediction of protein b-factor profiles, Proteins, 58, 905-912, (2005)
[83] Yuan, Z.; Huang, B., Prediction of protein accessible surface areas by support vector regression, Proteins, 57, 558-564, (2004)
[84] Zhang, C. J.; Tang, H.; Li, W. C.; Lin, H.; Chen, W., Iori-human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition, Oncotarget, 7, 69783-69793, (2016)
[85] Zhang, L.; Liao, B.; Li, D. C.; Zhu, W., A novel representation for apoptosis protein subcellular localization prediction using support vector machine, J. Theor. Biol., 259, 361-365, (2009) · Zbl 1402.92163
[86] Zhang, S. L.; Ding, S. Y.; Wang, T. M., High-accuracy prediction of protein structural classfor low-similarity sequences based on predicted secondary structure, Biochimie, 93, 710-714, (2011)
[87] Zhang, S. L.; Feng, Y.; Yuan, X. G., Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM, J. Biomol. Struct. Dyn., 29, 634-642, (2012)
[88] Zhang, Z. H.; Wang, Z. H.; Zhang, Z. R.; Wang, Y. X., A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine, FEBS Lett., 580, 6169-6174, (2006)
[89] Zhou, G. P.; Doctor, K., Subcellular location prediction of apoptosis proteins, Proteins Struct. Funct. Genet., 50, 44-48, (2003)
[90] Zhou, X. B.; Chen, C.; Li, Z. C.; Zou, X. Y., Improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine, Amino Acids, 35, 383-388, (2008)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.