Prediction and functional analysis of prokaryote lysine acetylation site by incorporating six types of features into Chou’s general PseAAC. (English) Zbl 1406.92172

Summary: Lysine acetylation is one of the most important types of protein post-translational modifications (PTM) that are widely involved in cellular regulatory processes. To fully understand the regulatory mechanism of acetylation, identification of acetylation sites is first and most important. However, experimental identification of protein acetylation sites is often time consuming and expensive. Thus, it is popular that predicts PTM sites by computational methods in recent years. Here, we developed a novel method, ProAcePred 2.0, to predict species-specific prokaryote lysine acetylation sites. In this study, we employed an efficient position-specific analysis strategy information gain method to constitute position-specific window of acetylation peptide, and then incorporated different types of features and adopted elastic net algorithm to optimize feature vectors for model learning. The prediction model achieved area under the receiver operating characteristic curve value of six species in training datasets, which are 0.78, 0.752, 0.783, 0.718, 0.839 and 0.826, of Escherichia coli, Corynebacterium glutamicum, Mycobacterium tuberculosis, Bacillus subtilis, S. typhimurium and Geobacillus kaustophilus, respectively. And our method was highly competitive for the majority of species when compared with other methods by using independent test datasets. In addition, function analyses demonstrated that different organisms were preferentially involved in different biological processes and pathways. The detailed analyses in this paper could help us to understand more of the acetylation mechanism and provide guidance for the related experimental validation. A user-friendly online web service of ProAcePred 2.0 can be freely available at http://computbiol.ncu.edu.cn/PAPred.


92C40 Biochemistry, molecular biology
92D20 Protein sequences, DNA sequences
Full Text: DOI


[1] Akbar, S.; Hayat, M., iMethyl-STTNC: Identification of N(6)-methyladenosine sites by extending the Idea of SAAC into Chou’s PseAAC to formulate RNA sequences, J. Theor. Biol., 455, 205-211, (2018)
[2] Arif, M.; Hayat, M.; Jan, Z., Imem-2lsaac: a two-level model for discrimination of membrane proteins and their types by extending the notion of saac into chou’s pseudo amino acid composition, J. Theor. Biol., 442, 11-21, (2018) · Zbl 1397.92180
[3] Barak, R.; Yan, J.; Shainskaya, A.; Eisenbach, M., The chemotaxis response regulator chey can catalyze its own acetylation, J. Mol. Biol., 359, 251-265, (2006)
[4] Basu, A.; Rose, K. L.; Zhang, J.; Beavis, R. C.; Ueberheide, B.; Garcia, B. A., Proteome-wide prediction of acetylation substrates, Proc. Natl. Acad. Sci. USA, 106, 13785-13790, (2009)
[5] Bereswill, S.; Geider, K., Characterization of the rcsb gene from erwinia amylovora and its influence on exoploysaccharide synthesis and virulence of the fire blight pathogen, J. Bacteriol., 17, 1354-1361, (1997)
[6] Cao, M.; Chen, G. D.; Wang, L. N.; Wen, P. P.; Shi, S. P., Computational prediction and analysis for tyrosine post-translational modifications via elastic net, J. Chem. Inf. Model., 58, 1272-1281, (2018)
[7] Castaño-Cerezo, S.; Bernal, V.; Post, H.; Fuhrer, T.; Cappadona, S.; Sánchez-Díaz, N. C., Protein acetylation affects acetate metabolism, motility and acid stress response in escherichia coli, Mol. Syst. Biol., 10, 762, (2015)
[8] Chen, G.; Cao, M.; Luo, K.; Wang, L.; Wen, P.; Shi, S., Proacepred: prokaryote lysine acetylation sites prediction based on elastic net feature optimization, Bioinformatics, (2018)
[9] Chen, W.; Feng, P. M.; Lin, H.; Chou, K. C., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, e68, (2013)
[10] Cheng, X.; Xiao, X.; Chou, K. C., pLoc-mVirus: predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC, Gene, 628, 315-321, (2017)
[11] Cheng, X.; Zhao, S. G.; Lin, W. Z.; Xiao, X.; Chou, K. C., Ploc-manimal: predict subcellular localization of animal proteins with both single and multiple sites, Bioinformatics, 33, 3524-3531, (2017)
[12] Cheng, X.; Zhao, S. G.; Xiao, X.; Chou, K. C., Iatc-mhyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals, Oncotarget, 8, 58494-58503, (2017)
[13] Cheng, X.; Zhao, S. G.; Xiao, X., iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics, 33, 341-346, (2017)
[14] Chou, K. C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins: Struct. Funct. Genet., 44, 246-255, (2001)
[15] Chou, K. C., Using subsite coupling to predict signal peptides, Protein Eng., 14, 75-79, (2001)
[16] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[17] Chou, K. C., Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology, Curr. Proteom., 6, 262-274, (2009)
[18] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247, (2011) · Zbl 1405.92212
[19] Chou, K. C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 218-234, (2015)
[20] Chou, K. C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top. Med. Chem., 17, 2337-2358, (2017)
[21] Chou, K. C.; Shen, H. B., Review: recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 1, 63-92, (2009)
[22] Domínguez-Bernal, G.; Pucciarelli, M. G.; Ramos-Morales, F.; García-Quintanilla, M.; Cano, D. A.; Casadesús, J., Repression of the rcsc-yojn-rcsb phosphorelay by the igaa protein is a requisite for salmonella virulence, Mol. Microbiol., 53, 1437-1449, (2004)
[23] Dons, L.; Eriksson, E.; Jin, Y.; Rottenberg, M. E.; Kristensson, K.; Larsen, C. N., Role of flagellin and the two-component chea/chey system of listeria monocytogenes in host cell invasion and virulence, Infect. Immun., 72, 3237-3244, (2004)
[24] Feng, P.; Ding, H.; Yang, H., Irna-psecoll: identifying the occurrence sites of different rna modifications by incorporating collective effects of nucleotides into pseknc, Mol. Ther. - Nucleic Acids, 7, 155-163, (2017)
[25] Feng, P.; Yang, H.; Ding, H., Idna6ma-pseknc: identifying dna n6-methyladenosine sites by incorporating nucleotide physicochemical properties into pseknc, Genomics, (2018)
[26] Feng, P. M.; Chen, W.; Lin, H., iHSP-PseRAAAC: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition, Anal. Biochem., 442, 118-125, (2013)
[27] Gnad, F.; Ren, S.; Choudhary, C.; Cox, J.; Mann, M., Predicting post-translational lysine acetylation using support vector machines, Bioinformatics, 26, 1666-1668, (2010)
[28] Hou, T.; Zheng, G.; Zhang, P.; Jia, J.; Li, J.; Xie, L., Lacep: lysine acetylation site prediction using logistic regression classifiers, PLoS One, 9, e89575, (2014)
[29] Huang, D. W.; Sherman, B. T.; Lempicki, R. A., Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources, Nat. Protoc., 4, 44-57, (2009)
[30] Ju, Z.; Wang, S. Y., Prediction of citrullination sites by incorporating k-spaced amino acid pairs into chou’s general pseudo amino acid composition, Gene, 664, 78-83, (2018)
[31] Lee, T. Y.; Hsu, J. B.; Lin, F. M.; Chang, W. C.; Hsu, P. C.; Huang, H. D., N-ace: using solvent accessibility and physicochemical properties to identify protein n-acetylation sites, J. Comput. Chem., 31, 2759-2771, (2010)
[32] Li, S.; Li, H.; Li, M.; Shyr, Y.; Xie, L.; Li, Y., Improved prediction of lysine acetylation by support vector machines, Protein Pept. Lett., 16, 977-983, (2009)
[33] Li, T.; Du, Y.; Wang, L.; Huang, L.; Li, W.; Lu, M., Characterization and prediction of lysine (k)-acetyl-transferase specific acetylation sites, Mol. Cell. Proteom., 11, (2012), M111.011080.
[34] Li, W.; Godzik, A., Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 1658-1659, (2006)
[35] Li, Y.; Wang, M.; Wang, H.; Tan, H.; Zhang, Z.; Webb, G. I., Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features, Sci. Rep., 4, 5765, (2014)
[36] Liang, W.; Malhotra, A.; Deutscher, M. P., Acetylation regulates the stability of a bacterial protein: growth stage-dependent modification of rnase r, Mol. Cell, 44, 160-166, (2011)
[37] Lin, H.; Deng, E. Z.; Ding, H.; Chen, W.; Chou, K. C., Ipro54-pseknc: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition, Nucleic Acids Res., 42, 12961-12972, (2014)
[38] Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K. C., Pse-in-one: a web server for generating various modes of pseudo components of dna, rna, and protein sequences, Nucleic Acids Res., 43, W65-W71, (2015)
[39] Liu, B.; Wu, H.; Chou, K. C., Pse-in-one 2.0: an improved package of web servers for generating various modes of pseudo components of dna, rna, and protein sequences, Nat. Sci., 9, 67-91, (2017)
[40] Liu, B.; Yang, F.; Huang, D. S., iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC, Bioinformatics, 34, 33-40, (2018)
[41] Marmorstein, R., Structure and function of histone acetyltransferases, Cell. Mol. Life Sci., 58, 693-703, (2001)
[42] Mei, J.; Zhao, J., Analysis and prediction of presynaptic and postsynaptic neurotoxins by chou’s general pseudo amino acid composition and motif features, J. Theor. Biol., 447, 147-153, (2018)
[43] Mouslim, C.; Delgado, M.; Groisman, E. A., Activation of the rcsc/yojn/rcsb phosphorelay system attenuates salmonella virulence, Mol. Microbiol., 54, 386-395, (2004)
[44] Noble, W. S., What is a support vector machine?, Nat. Biotechnol., 24, 1565-1567, (2006) · Zbl 1167.83321
[45] Qiu, W.; Li, S.; Cui, X., Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou’s pseudo-amino acid composition, J. Theor. Biol., 450, 86-103, (2018) · Zbl 1397.92228
[46] Qiu, W. R.; Sun, B. Q.; Xiao, X., Ikcr-pseens: identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier, Genomics, 110, 239-246, (2018)
[47] Qiu, W. R.; Xiao, X.; Lin, W. Z.; Chou, K. C., Imethyl-pseaac: identification of protein methylation sites via a pseudo amino acid composition approach, Biomed. Res. Int., 2014, (2014)
[48] Qiu, W. R.; Xiao, X.; Lin, W. Z.; Chou, K. C., Iubiq-lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model, J. Biomol. Struct. Dyn., 33, 1731-1742, (2015)
[49] Ren, J.; Sang, Y.; Lu, J.; Yao, Y. F., Protein acetylation and its role in bacterial virulence, Trends Microbiol., 25, 768-779, (2017)
[50] Shannon, C. E., The mathematical theory of communication (Reprinted), MD Comput., 14, 306-317, (1997)
[51] Shao, J.; Xu, D.; Hu, L.; Kwan, Y. W.; Wang, Y.; Kong, X., Systematic analysis of human lysine acetylation proteins and accurate prediction of human lysine acetylation through bi-relative adapted binomial score bayes feature representation, Mol. BioSyst., 8, 2964-2973, (2012)
[52] Shi, S. P.; Qiu, J. D.; Sun, X. Y.; Suo, S. B.; Huang, S. Y.; Liang, R. P., Plmla: prediction of lysine methylation and lysine acetylation by combining multiple features, Mol. BioSyst., 8, 1520-1527, (2012)
[53] Shi, S. P.; Xu, H. D.; Wen, P. P.; Qiu, J. D., Progress and challenges in predicting protein methylation sites, Mol. BioSyst., 11, 2610-2619, (2015)
[54] Song, J.; Li, F.; Takemoto, K., PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural and network features in a machine learning framework, J. Theor. Biol., 443, 125-137, (2018)
[55] Song, J.; Wang, Y.; Li, F., iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites, Brief. Bioinf., (2018)
[56] Song, L.; Wang, G.; Malhotra, A.; Deutscher, M. P.; Liang, W., Reversible acetylation on lys501 regulates the activity of rnase ii, Nucleic Acids Res., 44, 1979-1988, (2016)
[57] Starai, V. J.; Escalante-Semerena, J. C., Identification of the protein acetyltransferase (pat) enzyme that acetylates acetyl-coa synthetase in salmonella enterica, J. Mol. Biol., 340, 1005-1012, (2004)
[58] Suo, S. B.; Qiu, J. D.; Shi, S. P.; Sun, X. Y.; Huang, S. Y.; Chen, X., Position-specific analysis and prediction for protein lysine acetylation based on multiple features, PLoS One, 7, e49108, (2012)
[59] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B, 58, 267-288, (1996) · Zbl 0850.62538
[60] Umlauf, D.; Goto, Y.; Feil, R., Site-specific analysis of histone methylation and acetylation, Methods Mol. Biol., 287, 99-120, (2004)
[61] Vacic, V.; Iakoucheva, L. M.; Radivojac, P., Two sample logo: a graphical representation of the differences between two sets of sequence alignments, Bioinformatics, 22, 1536-1537, (2006)
[62] Wang, L. N.; Shi, S. P.; Xu, H. D.; Wen, P. P.; Qiu, J. D., Computational prediction of species-specific malonylation sites via enhanced characteristic strategy, Bioinformatics, 33, 1457-1463, (2017)
[63] Wang, Q.; Zhang, Y.; Yang, C.; Xiong, H.; Lin, Y.; Yao, J., Acetylation of metabolic enzymes coordinates carbon source utilization and metabolic flux, Science, 327, 1004-1007, (2010)
[64] Weinert, B. T.; Wagner, S. A.; Horn, H.; Henriksen, P.; Liu, W. R.; Olsen, J. V., Proteome-wide mapping of the drosophila acetylome demonstrates a high degree of conservation of lysine acetylation, Sci. Signal., 4, (2011), ra48.
[65] Weinert, Brian; Iesmantavicius, Vytautas,Wagner; Sebastian, Acetyl-phosphate is a critical determinant of lysine acetylation in e. coli, Mol. Cell, 51, 265-272, (2013)
[66] Welsch, D. J.; Nelsestuen, G. L., Amino-terminal alanine functions in a calcium-specific process essential for membrane binding by prothrombin fragment 1, Biochemistry, 27, 4939-4945, (1988)
[67] Wen, P. P.; Shi, S. P.; Xu, H. D.; Wang, L. N.; Qiu, J. D., Accurate in silico prediction of species-specific methylation sites based on information gain feature optimization, Bioinformatics, 32, 3107-3115, (2016)
[68] Wuyun, Q.; Zheng, W.; Zhang, Y.; Ruan, J.; Hu, G., Improved species-specific lysine acetylation site prediction based on a large variety of features set, PLoS One, 11, (2016)
[69] Xiao, X.; Cheng, X.; Su, S.; Mao, Q.; Chou, K. C., Ploc-mgpos: incorporate key gene ontology information into general pseaac for predicting subcellular localization of gram-positive bacterial proteins, Nat. Sci., 9, 331-349, (2017)
[70] Xiao, X. Y.; Yin, H. W., Achieving higher order of convergence for solving systems of nonlinear equations, Appl. Math. Comput., 311, 251-261, (2017)
[71] Xie, L.; Wang, X.; Zeng, J.; Zhou, M.; Duan, X.; Li, Q., Proteome-wide lysine acetylation profiling of the human pathogen mycobacterium tuberculosis, Int. J. Biochem. Cell Biol., 59, 193-202, (2015)
[72] Xu, H.; Zhou, J.; Lin, S.; Deng, W.; Zhang, Y.; Xue, Y., Plmd: an updated data resource of protein lysine modifications, J. Genet. Genom., 44, 243-250, (2017)
[73] Xu, Y.; Ding, J.; Wu, L. Y.; Chou, K. C., Isno-pseaac: predict cysteine s-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition, PLoS One, 8, e55844, (2013)
[74] Xu, Y.; Shao, X. J.; Wu, L. Y.; Deng, N. Y.; Chou, K. C., Isno-aapair: incorporating amino acid pairwise coupling into pseaac for predicting cysteine s-nitrosylation sites in proteins, Peerj, 1, e171, (2013)
[75] Xu, Y.; Wang, X. B.; Ding, J.; Wu, L. Y.; Deng, N. Y., Lysine acetylation sites prediction using an ensemble of support vector machine classifiers, J. Theor. Biol., 264, 130-135, (2010)
[76] Xu, Y.; Wen, X.; Shao, X. J.; Deng, N. Y.; Chou, K. C., Ihyd-pseaac: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition, Int. J. Mol. Sci., 15, 7594-7610, (2014)
[77] Xu, Y.; Wen, X.; Wen, L. S.; Wu, L. Y., Initro-tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition, PLoS One, 9, (2014)
[78] Yang, H.; Qiu, W. R., iRSpot-Pse6NC: identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC, Int. J. Biol. Sci., 14, 883-891, (2018)
[79] Yao, R.; Burr, D. H.; Guerry, P., Chey-mediated modulation of campylobacter jejuni virulence, Mol. Microbiol., 23, 1021-1031, (1997)
[80] Zhang, K.; Zheng, S.; Yang, J. S.; Chen, Y.; Cheng, Z., Comprehensive profiling of protein lysine acetylation in escherichia coli, J. Proteome Res., 12, 844-851, (2013)
[81] Zhou, H.; Boyle, R.; Aebersold, R., Quantitative protein analysis by solid phase isotope tagging and mass spectrometry, Methods Mol. Biol., 261, 511-518, (2004)
[82] Zhou, R.; Wang, X.; Tang, X. B., A generalization of the Hermitian and skew-Hermitian splitting iteration method for solving Sylvester equations, Appl. Math. Comput., 271, 609-617, (2015)
[83] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, J. R. Stat. Soc. B, 67, 301-320, (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.