Predicting protein sub-Golgi locations by combining functional domain enrichment scores with pseudo-amino acid compositions. (English) Zbl 1414.92206

Summary: Golgi apparatus is an important subcellular organelle that participates the secretion pathway. The role of Golgi apparatus in cellular process is related with Golgi-resident proteins. Knowing the sub-Golgi locations of Golgi-resident proteins is helpful in understanding their molecular functions. In this work, we proposed a computational method to predict the sub-Golgi locations for the Golgi-resident proteins. We take three sub-Golgi locations into consideration: the cis-Golgi network (CGN), the Golgi stack and the trans-Golgi network (TGN). By combining pseudo-amino acid compositions (Type-II PseAAC) and the functional domain enrichment score (FunDES), our method not only achieved better performances than existing methods, but also capable of recognizing proteins of the Golgi stack location, which is never considered in other state-of-the-art works.


92D20 Protein sequences, DNA sequences
92C37 Cell biology
Full Text: DOI


[1] Ahmad, J.; Javed, F.; Hayat, M., Intelligent computational model for classification of sub-Golgi protein using oversampling and fisher feature selection methods, Artif. Intell. Med., 78, 14-22, (2017)
[2] Ahmad, K.; Waris, M.; Hayat, M., Prediction of protein submitochondrial locations by incorporating dipeptide composition into chou’s general pseudo amino acid composition, J. Membr. Biol., 249, 293-304, (2016)
[3] Chang, C.-C.; Lin, C.-J., LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol., 2, 27, 1-27, (2011), 27
[4] Chen, W.; Feng, P.-M.; Lin, H.; Chou, K.-C., iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, e68, (2013)
[5] Chen, W.; Lei, T.-Y.; Jin, D.-C.; Lin, H.; Chou, K.-C., PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition, Anal. Biochem., 456, 53-60, (2014)
[6] Chen, W.; Lin, H.; Chou, K.-C., Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences, Mol. Biosyst., 11, 2620-2634, (2015)
[7] Chen, W.; Zhang, X.; Brooker, J.; Lin, H.; Zhang, L.; Chou, K.-C., PseKNC-general: a cross-platform package for generating various modes of pseudo nucleotide compositions, Bioinforma. Oxf. Engl., 31, 119-120, (2015)
[8] Chen, Z.; Zhao, P.; Li, F.; Leier, A.; Marquez-Lago, T. T.; Wang, Y.; Webb, G. I.; Smith, A. I.; Daly, R. J.; Chou, K.-C.; Song, J., iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences, Bioinforma. Oxf. Engl., 34, 2499-2502, (2018)
[9] Cheng, X.; Xiao, X.; Chou, K.-C., pLoc-mHum: predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information, Bioinforma. Oxf. Engl., 34, 1448-1456, (2018)
[10] Chou, K. C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43, 246-255, (2001)
[11] Chou, K.-C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinforma. Oxf. Engl., 21, 10-19, (2005)
[12] Chou, K.-C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247, (2011) · Zbl 1405.92212
[13] Chou, K.-C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 1092-1100, (2013)
[14] Chou, K.-C., Impacts of bioinformatics to medicinal chemistry, Med. Chem. Shariqah United Arab Emir., 11, 218-234, (2015)
[15] Chou, K.-C.; Cai, Y.-D., Using functional domain composition and support vector machines for prediction of protein subcellular location, J. Biol. Chem., 277, 45765-45769, (2002)
[16] Chou, W.-C.; Yin, Y.; Xu, Y., GolgiP: prediction of Golgi-resident proteins in plants, Bioinforma. Oxf. Engl., 26, 2464-2465, (2010)
[17] Ding, H.; Guo, S.-H.; Deng, E.-Z.; Yuan, L.-F.; Guo, F.-B.; Huang, J.; Rao, N.; Chen, W.; Lin, H., Prediction of Golgi-resident protein types by using feature selection technique, Chemom. Intell. Lab. Syst., 124, 9-13, (2013)
[18] Ding, H.; Liu, L.; Guo, F.-B.; Huang, J.; Lin, H., Identify Golgi protein types with modified Mahalanobis discriminant algorithm and pseudo amino acid composition, Protein Pept. Lett., 18, 58-63, (2011)
[19] Du, P.; Cao, S.; Li, Y., SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm, J. Theor. Biol., 261, 330-335, (2009) · Zbl 1403.92063
[20] Du, P.; Gu, S.; Jiao, Y., PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets, Int. J. Mol. Sci., 15, 3495-3506, (2014)
[21] Du, P.; Li, T.; Wang, X.; Xu, C., SubChlo-GO: predicting protein subchloroplast locations with weighted gene ontology scores, Curr. Bioinforma., 8, 193-199, (2013)
[22] Du, P.; Tian, Y.; Yan, Y., Subcellular localization prediction for human internal and organelle membrane proteins with projected gene ontology scores, J. Theor. Biol., 313, 61-67, (2012)
[23] Du, P.; Wang, X.; Xu, C.; Gao, Y., PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions, Anal. Biochem., 425, 117-119, (2012)
[24] Du, P.-F., A brief review on software implementations and algorithm enhancements of Chou’s pseudo-amino acid compositions, Curr. Proteomics, 13, 105-112, (2016)
[25] Du, P.-F., Predicting protein submitochondrial locations: the 10th anniversary, Curr. Genomics, 18, 316-321, (2017)
[26] Du, P.-F.; Zhao, W.; Miao, Y.-Y.; Wei, L.-Y.; Wang, L., UltraPse: a universal and extensible software platform for representing biological sequences, Int. J. Mol. Sci., 18, (2017)
[27] Fu, L.; Niu, B.; Zhu, Z.; Wu, S.; Li, W., CD-HIT: accelerated for clustering the next-generation sequencing data, Bioinforma. Oxf. Engl., 28, 3150-3152, (2012)
[28] Guo, S.-H.; Deng, E.-Z.; Xu, L.-Q.; Ding, H.; Lin, H.; Chen, W.; Chou, K.-C., iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition, Bioinforma. Oxf. Engl., 30, 1522-1529, (2014)
[29] Han, G. S.; Yu, Z. G.; Anh, V.; Krishnajith, A. P.D.; Tian, Y.-C., An ensemble method for predicting subnuclear localizations from primary protein structures, PLOS One, 8, e57225, (2013)
[30] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC, Oncotarget, 7, 34558-34570, (2016)
[31] Jiao, Y.; Du, P., Performance measures in evaluating machine learning based bioinformatics predictors for classifications, Quant. Biol., 4, 320-330, (2016)
[32] Jiao, Y.-S.; Du, P.-F., Predicting Golgi-resident protein types using pseudo amino acid compositions: approaches with positional specific physicochemical properties, J. Theor. Biol., 391, 35-42, (2016) · Zbl 1343.92154
[33] Jiao, Y. S.; Du, P. F., Prediction of Golgi-resident protein types using general form of Chou’s pseudo-amino acid compositions: approaches with minimal redundancy maximal relevance feature selection, J. Theor. Biol., 402, 38-44, (2016) · Zbl 1343.92378
[34] Jiao, Y.-S.; Du, P.-F., Predicting protein submitochondrial locations by incorporating the positional-specific physicochemical properties into Chou’s general pseudo-amino acid compositions, J. Theor. Biol., 416, 81-87, (2017)
[35] Ju, Z.; Cao, J.-Z.; Gu, H., Predicting lysine phosphoglycerylation with fuzzy {SVM} by incorporating k-spaced amino acid pairs into Chou׳s general PseAAC, J. Theor. Biol., 397, 145-150, (2016)
[36] Kawashima, S.; Kanehisa, M., AAindex: amino acid index database, Nucleic Acids Res., 28, 374, (2000)
[37] Liu, B.; Liu, F.; Fang, L.; Wang, X.; Chou, K.-C., repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects, Bioinforma. Oxf. Engl., 31, 1307-1309, (2015)
[38] Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K.-C., Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res., 43, W65-W71, (2015)
[39] Liu, B.; Liu, F.; Fang, L.; Wang, X.; Chou, K.-C., repRNA: a web server for generating various feature vectors of RNA sequences, Mol. Genet. Genomics MGG, 291, 473-481, (2016)
[40] Marchler-Bauer, A.; Derbyshire, M. K.; Gonzales, N. R.; Lu, S.; Chitsaz, F.; Geer, L. Y.; Geer, R. C.; He, J.; Gwadz, M.; Hurwitz, D. I.; Lanczycki, C. J.; Lu, F.; Marchler, G. H.; Song, J. S.; Thanki, N.; Wang, Z.; Yamashita, R. A.; Zhang, D.; Zheng, C.; Bryant, S. H., CDD: nCBI’s conserved domain database, Nucleic Acids Res., 43, D222-D226, (2015)
[41] Qiu, W.-R.; Sun, B.-Q.; Xiao, X.; Xu, Z.-C.; Chou, K.-C., iPTM-mLys: identifying multiple lysine PTM sites and their different types, Bioinforma. Oxf. Engl., 32, 3116-3123, (2016)
[42] Qiu, W.-R.; Xiao, X.; Lin, W.-Z.; Chou, K.-C., iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach, BioMed Res. Int., 2014, Article 947416 pp., (2014)
[43] Rahman, M. Saifur; Rahman, M. K.; Kaykobad, M.; Rahman, M. Sohel, isGPT: an optimized model to identify sub-Golgi protein types using SVM and random forest based feature selection, Artif. Intell. Med., 84, 90-100, (2018)
[44] UniProt: the universal protein knowledgebase, Nucleic Acids Res., 45, D158-D169, (2017)
[45] van Dijk, A. D.J.; Bosch, D.; ter Braak, C. J.F.; van der Krol, A. R.; van Ham, R. C.H. J., Predicting sub-Golgi localization of type II membrane proteins, Bioinforma. Oxf. Engl., 24, 1779-1786, (2008)
[46] Wang, S.; Liu, S., Protein sub-nuclear localization based on effective fusion representations and dimension reduction algorithm LDA, Int. J. Mol. Sci., 16, 30343-30361, (2015)
[47] Wang, S.; Yue, Y., Protein subnuclear localization based on a new effective representation and intelligent kernel linear discriminant analysis by dichotomous greedy genetic algorithm, PLOS One, 13, Article e0195636 pp., (2018)
[48] Wang, X.; Zhang, W.; Zhang, Q.; Li, G.-Z., MultiP-SChlo: multi-label protein subchloroplast localization prediction with Chou’s pseudo amino acid composition and a novel multi-label classifier, Bioinforma. Oxf. Engl., 31, 2639-2645, (2015)
[49] Yang, R.; Zhang, C.; Gao, R.; Zhang, L., A novel feature extraction method with feature selection to identify Golgi-resident protein types from imbalanced data, Int. J. Mol. Sci., 17, 218, (2016)
[50] Zhao, W.; Wang, L.; Zhang, T.-X.; Zhao, Z.-N.; Du, P.-F., A brief review on software tools in generating Chou’s pseudo-factor representations for all types of biological sequences, Protein Pept. Lett., 25, 822-829, (2018)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.