×

zbMATH — the first resource for mathematics

Predicting Golgi-resident protein types using pseudo amino acid compositions: approaches with positional specific physicochemical properties. (English) Zbl 1343.92154
Summary: Knowing the type of a Golgi-resident protein is an important step in understanding its molecular functions as well as its role in biological processes. In this paper, we developed a novel computational method to predict Golgi-resident protein types using positional specific physicochemical properties and analysis of variance based feature selection methods. Our method achieved 86.9% prediction accuracy in leave-one-out cross-validations with only 59 features. Our method has the potential to be applied in predicting a wide range of protein attributes.

MSC:
92C40 Biochemistry, molecular biology
92E10 Molecular structure (graph-theoretic methods, methods of differential topology, etc.)
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Ali, F.; Hayat, M., Classification of membrane protein types using voting feature interval in combination with chou׳s pseudo amino acid composition, J. Theor. Biol., 384, 78-83, (2015) · Zbl 1343.92006
[2] Altschul, S. F.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res., 25, 3389-3402, (1997)
[3] Cao, D.-S.; Xu, Q.-S.; Liang, Y.-Z., Propy: a tool to generate various modes of chou’s pseaac, Bioinformatics, 29, 960-962, (2013)
[4] Chang, C.; Chang, C.; Lin, C., LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol., (2011)
[5] Chen, W.; Feng, P.; Ding, H.; Lin, H.; Chou, K.-C., Irna-methyl: identifying N(6)-methyladenosine sites using pseudo nucleotide composition, Anal. Biochem., 490, 26-33, (2015)
[6] Chen, W.; Feng, P.-M.; Deng, E.-Z.; Lin, H.; Chou, K.-C., Itis-psetnc: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition, Anal. Biochem., 462, 76-83, (2014)
[7] Chen, W.; Feng, P.-M.; Lin, H.; Chou, K.-C., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, e68, (2013)
[8] Chen, W.; Lei, T.-Y.; Jin, D.-C.; Lin, H.; Chou, K.-C., Pseknc: a flexible web server for generating pseudo K-tuple nucleotide composition, Anal. Biochem., 456, 53-60, (2014)
[9] Chen, W.; Lin, H.; Chou, K.-C., Pseudo nucleotide composition or pseknc: an effective formulation for analyzing genomic sequences, Mol. Biosyst., 11, 2620-2634, (2015)
[10] Chen, W.; Zhang, X.; Brooker, J.; Lin, H.; Zhang, L.; Chou, K.-C., Pseknc-general: a cross-platform package for generating various modes of pseudo nucleotide compositions, Bioinformatics, 31, 119-120, (2015)
[11] Chou, K.-C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 218-234, (2015)
[12] Chou, K.-C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 1092-1100, (2013)
[13] Chou, K.-C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247, (2011) · Zbl 1405.92212
[14] Chou, K.-C., Pseudo amino acid composition and its applications in bioinformatics, Proteom. Syst. Biol. Curr. Proteom., 6, 262-274, (2009)
[15] Chou, K.-C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[16] Chou, K. C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43, 246-255, (2001)
[17] Chou, K. C., Using subsite coupling to predict signal peptides, Protein Eng., 14, 75-79, (2001)
[18] Chou, K.-C.; Cai, Y.-D., Predicting protein quaternary structure by pseudo amino acid composition, Proteins, 53, 282-289, (2003)
[19] Chou, K.-C.; Shen, H.-B., Cell-ploc: a package of web servers for predicting subcellular localization of proteins in various organisms, Nat. Protoc., 3, 153-162, (2008)
[20] Chou, K.-C.; Wu, Z.-C.; Xiao, X., Iloc-hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites, Mol. Biosyst., 8, 629-641, (2012)
[21] Chou, K.-C.; Wu, Z.-C.; Xiao, X., Iloc-euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins, Plos. ONE, 6, e18258, (2011)
[22] Chou, K. C.; Zhang, C. T., Prediction of protein structural classes, Crit. Rev. Biochem. Mol. Biol., 30, 275-349, (1995)
[23] Dehzangi, A.; Heffernan, R.; Sharma, A.; Lyons, J.; Paliwal, K.; Sattar, A., Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into chou׳s general pseaac, J. Theor. Biol., 364, 284-294, (2015)
[24] Ding, H.; Deng, E.-Z.; Yuan, L.-F.; Liu, L.; Lin, H.; Chen, W.; Chou, K.-C., Ictx-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels, Biomed. Res. Int., 2014, 286419, (2014)
[25] Ding, H.; Guo, S.-H.; Deng, E.-Z.; Yuan, L.-F.; Guo, F.-B.; Huang, J.; Rao, N.; Chen, W.; Lin, H., Prediction of golgi-resident protein types by using feature selection technique, Chemom. Intell. Lab. Syst., 124, 9-13, (2013)
[26] Ding, H.; Liu, L.; Guo, F.-B.; Huang, J.; Lin, H., Identify golgi protein types with modified Mahalanobis discriminant algorithm and pseudo amino acid composition, Protein Pept. Lett., 18, 58-63, (2011)
[27] Du, P.; Cao, S.; Li, Y., Subchlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm, J. Theor. Biol., 261, 330-335, (2009)
[28] Du, P.; Gu, S.; Jiao, Y., Pseaac-general: fast building various modes of general form of chou’s pseudo-amino acid composition for large-scale protein datasets, Int. J. Mol. Sci., 15, 3495-3506, (2014)
[29] Du, P.; Li, T.; Wang, X., Recent progress in predicting protein sub-subcellular locations, Expert Rev. Proteom., 8, 391-404, (2011)
[30] Du, P.; Li, Y., Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence, BMC Bioinf., 7, 518, (2006)
[31] Du, P.; Wang, X.; Xu, C.; Gao, Y., Pseaac-builder: a cross-platform stand-alone program for generating various special chou’s pseudo-amino acid compositions, Anal. Biochem., 425, 117-119, (2012)
[32] Du, P.; Xu, C., Predicting multisite protein subcellular locations: progress and challenges, Expert Rev. Proteom., 10, 227-237, (2013)
[33] Du, P.; Yu, Y., Submito-PSPCP: predicting protein submitochondrial locations by hybridizing positional specific physicochemical properties with pseudoamino acid compositions, Biomed. Res. Int., 2013, 263829, (2013)
[34] Du, P.; Yu, Y., Submito-PSPCP: predicting protein submitochondrial locations by hybridizing positional specific physicochemical properties with pseudoamino acid compositions, Biomed. Res. Int., 2013, (2013)
[35] Guo, S.-H.; Deng, E.-Z.; Xu, L.-Q.; Ding, H.; Lin, H.; Chen, W.; Chou, K.-C., Inuc-pseknc: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition, Bioinformatics, 30, 1522-1529, (2014)
[36] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., Ippi-esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into pseaac, J. Theor. Biol., 377, 47-56, (2015)
[37] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition, J. Biomol. Struct. Dyn., 1-16, (2015)
[38] Kawashima, S.; Pokarowski, P.; Pokarowska, M.; Kolinski, A.; Katayama, T.; Kanehisa, M., Aaindex: amino acid index database, progress report 2008, Nucleic Acids Res., 36, D202-D205, (2008)
[39] Kumar, R.; Srivastava, A.; Kumari, B.; Kumar, M., Prediction of β-lactamase and its class by chou’s pseudo-amino acid composition and support vector machine, J. Theor. Biol., 365, 96-103, (2015) · Zbl 1314.92055
[40] Lin, H.; Chen, W.; Yuan, L.-F.; Li, Z.-Q.; Ding, H., Using over-represented tetrapeptides to predict protein submitochondria locations, Acta Biotheor., 61, 259-268, (2013)
[41] Lin, H.; Deng, E.-Z.; Ding, H.; Chen, W.; Chou, K.-C., Ipro54-pseknc: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition, Nucleic Acids Res., 42, 12961-12972, (2014)
[42] Lin, W.-Z.; Fang, J.-A.; Xiao, X.; Chou, K.-C., Iloc-animal: a multi-label learning classifier for predicting subcellular localization of animal proteins, Mol. Biosyst., 9, 634-644, (2013)
[43] Liu, B.; Fang, L.; Liu, F.; Wang, X.; Chen, J.; Chou, K.-C., Identification of real microrna precursors with a pseudo structure status composition approach, Plos. ONE, 10, e0121501, (2015)
[44] Liu, B.; Fang, L.; Liu, F.; Wang, X.; Chou, K.-C., Imirna-psedpc: microrna precursor identification with a pseudo distance-pair composition approach, J. Biomol. Struct. Dyn., 1-13, (2015)
[45] Liu, B.; Fang, L.; Wang, S.; Wang, X.; Li, H.; Chou, K.-C., Identification of microrna precursor with the degenerate K-tuple or kmer strategy, J. Theor. Biol., 385, 153-159, (2015)
[46] Liu, B.; Liu, F.; Fang, L.; Wang, X.; Chou, K.-C., Repdna: a python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects, Bioinformatics, 31, 1307-1309, (2015)
[47] Liu, B.; Liu, F.; Fang, L.; Wang, X.; Chou, K.-C., Reprna: a web server for generating various feature vectors of RNA sequences, Mol. Genet. Genom., (2015)
[48] Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K.-C., Pse-in-one: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res., 43, W65-W71, (2015)
[49] Liu, B.; Xu, J.; Fan, S.; Xu, R.; Zhou, J.; Wang, X., Psedna-pro: DNA-binding protein identification by combining chou’s pseaac and physicochemical distance transformation, Mol. Inf., 34, 8-17, (2015)
[50] Liu, B.; Xu, J.; Lan, X.; Xu, R.; Zhou, J.; Wang, X.; Chou, K.-C., Idna-prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition, Plos. ONE, 9, e106691, (2014)
[51] Liu, Z.; Xiao, X.; Qiu, W.-R.; Chou, K.-C., Idna-methyl: identifying DNA methylation sites via pseudo trinucleotide composition, Anal. Biochem., 474, 69-77, (2015)
[52] Pacharawongsakda, E.; Theeramunkong, T., Predict subcellular locations of singleplex and multiplex proteins by semi-supervised learning and dimension-reducing general mode of chou’s pseaac, IEEE Trans. Nanobiosci., 12, 311-320, (2013)
[53] Shen, H.-B.; Chou, K.-C., Pseaac: a flexible web server for generating various kinds of protein pseudo amino acid composition, Anal. Biochem., 373, 386-388, (2008)
[54] Shen, H.-B.; Chou, K.-C., Nuc-ploc: a new web-server for predicting protein subnuclear localization by fusing pseaa composition and psepssm, Protein Eng. Des. Sel., 20, 561-567, (2007)
[55] Shen, H.-B.; Chou, K.-C., Virus-ploc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells, Biopolymers, 85, 233-240, (2007)
[56] Shen, H.-B.; Yang, J.; Chou, K.-C., Euk-ploc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction, Amino Acids, 33, 57-67, (2007)
[57] Shen, H.-B.; Yi, D.-L.; Yao, L.-X.; Yang, J.; Chou, K.-C., Knowledge-based computational intelligence development for predicting protein secondary structures from sequences, Expert. Rev. Proteom., 5, 653-662, (2008)
[58] Shi, S.-P.; Qiu, J.-D.; Sun, X.-Y.; Huang, J.-H.; Huang, S.-Y.; Suo, S.-B.; Liang, R.-P.; Zhang, L., Identify submitochondria and subchloroplast locations with pseudo amino acid composition: approach from the strategy of discrete wavelet transform feature extraction, Biochim. Biophys. Acta, 1813, 424-430, (2011)
[59] van Dijk, A. D.J.; Bosch, D.; ter Braak, C. J.F.; van der Krol, A. R.; van Ham, R. C.H. J., Predicting sub-golgi localization of type II membrane proteins, Bioinformatics, 24, 1779-1786, (2008)
[60] Vapnik, V., The nature of statistical learning theory, (1995), Springer New York · Zbl 0833.62008
[61] Wang, X.; Zhang, W.; Zhang, Q.; Li, G.-Z., Multip-schlo: multi-label protein subchloroplast localization prediction with chou’s pseudo amino acid composition and a novel multi-label classifier, Bioinformatics, 31, 2639-2645, (2015)
[62] Wu, Z.-C.; Xiao, X.; Chou, K.-C., Iloc-gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex Gram-positive bacterial proteins, Protein Pept. Lett., 19, 4-14, (2012)
[63] Wu, Z.-C.; Xiao, X.; Chou, K.-C., Iloc-plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites, Mol. Biosyst., 7, 3287-3297, (2011)
[64] Xiao, X.; Wang, P.; Lin, W.-Z.; Jia, J.-H.; Chou, K.-C., Iamp-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types, Anal. Biochem., 436, 168-177, (2013)
[65] Xiao, X.; Wu, Z.-C.; Chou, K.-C., Iloc-virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites, J. Theor. Biol., 284, 42-51, (2011)
[66] Xu, Y.; Shao, X.-J.; Wu, L.-Y.; Deng, N.-Y.; Chou, K.-C., Isno-aapair: incorporating amino acid pairwise coupling into pseaac for predicting cysteine S-nitrosylation sites in proteins, PeerJ., 1, e171, (2013)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.