×

zbMATH — the first resource for mathematics

Fu-SulfPred: identification of protein S-sulfenylation sites by fusing forests via Chou’s general PseAAC. (English) Zbl 1406.92221
Summary: Protein S-sulfenylation is an essential post-translational modification (PTM) that provides critical information to understand molecular mechanisms of cell signaling transduction, stress response and regulation of cellular functions. Recent advancements in computational methods have contributed towards the detection of protein S-sulfenylation sites. However, the performance of identifying protein S-sulfenylation sites can be influenced by a class imbalance of training datasets while the application of various computational methods. In this study, we designed a Fu-SulfPred model using stratified structure of three kinds of decision trees in order to identify possible protein S-sulfenylation sites by means of reconstructing training datasets and sample rescaling technology. Experimental results showed that the correlation coefficient values of Fu-SulfPred model were found to be 0.5437, 0.3736 and 0.6809 on three independent test datasets, respectively, all of which outperformed the Matthews coefficient values of S-SulfPred model. Fu-SulfPred model provides a promising scheme for the identification of protein S-sulfenylation sites and other post-translational modifications.

MSC:
92C40 Biochemistry, molecular biology
92D20 Protein sequences, DNA sequences
92-08 Computational methods for problems pertaining to biology
68T05 Learning and adaptive systems in artificial intelligence
92-04 Software, source code, etc. for problems pertaining to biology
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Arif, M.; Hayat, M.; Jan, Z., Imem-2lsaac: a two-level model for discrimination of membrane proteins and their types by extending the notion of saac into chou’s pseudo amino acid composition, J. Theor. Biol., 442, 11-21, (2018) · Zbl 1397.92180
[2] Breiman, L., Random forests, machine learning 45, J. Clin. Microbiol., 2, 199-228, (2001)
[3] Breiman, L. I.; Friedman, J. H.; Olshen, R. A.; Stone, C. J., Classification and regression trees (cart), Encycl. Ecol., 40, 3, 582-588, (1984)
[4] Bui, V. M.; Lu, C. T.; Ho, T. T.; Lee, T. Y., Mdd:csoh: exploiting maximal dependence decomposition to identify s-sulfenylation sites with substrate motifs, Bioinformatics, 32, 2, 165-172, (2016)
[5] Bui, V. M.; Weng, S. L.; Lu, C. T.; Chang, T. H.; Weng, T. Y.; Lee, T. Y., Sohsite: incorporating evolutionary information and physicochemical properties to identify protein s-sulfenylation sites, BMC Genomics, 17, 1, 59-70, (2016)
[6] Chen, W.; Feng, P.; Ding, H.; Lin, H.; Chou, K. C., Irna-methyl: identifying n(6)-methyladenosine sites using pseudo nucleotide composition, Anal. Biochem., 490, 26-33, (2015)
[7] Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K. C., Irna-3typea: identifying three types of modification at RNAs adenosine sites, Mol. Ther. Nucleic Acids, 11, 468-474, (2018)
[8] Chen, W.; Feng, P. M.; Lin, H.; Chou, K. C., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, 6, e68, (2013)
[9] Chen, W.; Tang, H.; Ye, J.; Lin, H.; Chou, K. C., Irna-pseu: identifying rna pseudouridine sites, Mol. Ther. Nucleic Acids, 5, 7, e332, (2016)
[10] Cheng, X.; Lin, W. Z.; Xiao, X.; Chou, K. C., Ploc_bal-manimal: predict subcellular localization of animal proteins by balancing training dataset and PseAAC, Bioinformatics, (2018)
[11] Cheng, X.; Xiao, X.; Chou, K. C., Ploc-meuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key go information into general PseAAC, Genomics, 110, 1, 50-58, (2017)
[12] Cheng, X.; Xiao, X.; Chou, K. C., Ploc-mgneg: predict subcellular localization of gram-negative bacterial proteins by deep gene ontology learning via general PseAAC, Genomics, 110, 231-239, (2017)
[13] Chou, K. C., Prediction of signal peptides using scaled window, Peptides, 22, 12, 1973-1979, (2001)
[14] Chou, K. C., Using subsite coupling to predict signal peptides, Protein Eng., 14, 2, 75-79, (2001)
[15] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 1, 10-19, (2005)
[16] Chou, K. C., Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology, Curr Proteomics, 6, 4, 262-274, (2009)
[17] Chou, K. C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins Struct. Funct. Bioinform., 43, 3, 246-255, (2010)
[18] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 1, 236-247, (2011) · Zbl 1405.92212
[19] Chou, K. C., Impacts of bioinformatics to medicinal chemistry, Med. Chem. (Los Angeles), 11, 3, 218-234, (2015)
[20] Chou, K. C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top. Med. Chem., 17, 21, 2337-2358, (2017)
[21] Chou, K. C.; Shen, H. B., Review : recent advances in developing web-servers for predicting protein attributes, Nat. Sci. (Irvine), 1, 2, 63-92, (2009)
[22] Feng, P.; Ding, H.; Yang, H.; Chen, W.; Lin, H.; Chou, K. C., Irna-psecoll: identifying the occurrence sites of different rna modifications by incorporating collective effects of nucleotides into pseknc, Mol. Ther. Nucleic Acids, 7, C, 155-163, (2017)
[23] Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chen, W.; Chou, K. C., Idna6ma-pseknc: identifying dna n 6 -methyladenosine sites by incorporating nucleotide physicochemical properties into pseknc, Genomics, (2018)
[24] Hasan, M. M.; Guo, D.; Kurata, H., Computational identification of protein s-sulfenylation sites by incorporating the multiple sequence features information, Mol. Biosyst., 13, 12, 2545-2550, (2017)
[25] Hu, Q.; Che, X.; Zhang, L.; Zhang, D.; Guo, M.; Yu, D., Rank entropy based decision trees for monotonic classification, IEEE Trans. Knowl. Data Eng., 24, 11, 2052-2064, (2012)
[26] Jia, C.; Yang, Q.; Zou, Q., Nucpospred: predicting species-specific genomic nucleosome positioning via four different modes of general pseknc, J. Theor. Biol., 450, 15-21, (2018) · Zbl 1397.92010
[27] Jia, C.; Zuo, Y., S-Sulfpred: a sensitive predictor to capture s-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique, J. Theor. Biol., 422, 84-89, (2017)
[28] Jia, C.; Zuo, Y.; Zou, Q.; Hancock, J., O-Glcnacpred-ii: an integrated classification algorithm for identifying o-glcnacylation sites based on fuzzy undersampling and a k-means pca oversampling technique, Bioinformatics, 34, 12, 2029-2036, (2018)
[29] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K. C., Ippi-esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC, J. Theor. Biol., 377, 47-56, (2015)
[30] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K. C., Icar-psecp: identify carbonylation sites in proteins by monte carlo sampling and incorporating sequence coupled effects into general PseAAC, Oncotarget, 7, 23, 34558-34570, (2016)
[31] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K. C., Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition, J. Biomol. Struct. Dyn., 34, 9, 1946-1961, (2016)
[32] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K. C., Ippbs-opt: a sequence-based ensemble classifier for identifying protein-protein binding sites by optimizing imbalanced training datasets, Molecules, 21, 1, E95, (2016)
[33] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K. C., Isuc-pseopt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset, Anal. Biochem., 497, 48-56, (2016)
[34] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K. C., Psuc-lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach, J. Theor. Biol., 394, 223-230, (2016) · Zbl 1343.92153
[35] Jia, J.; Zhang, L.; Liu, Z.; Xiao, X.; Chou, K. C., Psumo-cd: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC, Bioinformatics, 32, 20, 3133-3141, (2016)
[36] Ju, Z.; Wang, S. Y., Prediction of citrullination sites by incorporating k-spaced amino acid pairs into Chou’s general pseudo amino acid composition, Gene, 664, 78-83, (2018)
[37] Khan, Y. D.; Rasool, N.; Hussain, W.; Khan, S. A.; Chou, K. C., Iphost-PseAAC: identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC, Anal. Biochem., 550, 109-116, (2018)
[38] Lin, H.; Deng, E. Z.; Ding, H.; Chen, W.; Chou, K. C., Ipro54-pseknc: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition, Nucleic Acids Res., 42, 21, 12961-12972, (2014)
[39] Liu, B.; Li, K.; Huang, D. S.; Chou, K. C., Ienhancer-el: identifying enhancers and their strength with ensemble learning approach, Bioinformatics, (2018)
[40] Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K. C., Pse-in-one: a web server for generating various modes of pseudo components of dna, rna, and protein sequences, Nucleic Acids Res., 43, Web Server issue, W65-W71, (2015)
[41] Liu, B.; Wang, S.; Long, R.; Chou, K. C., Irspot-el: identify recombination spots with an ensemble learning approach, Bioinformatics, 33, 1, 35-41, (2017)
[42] Liu, B.; Weng, F.; Huang, D. S.; Chou, K. C., Iro-3wpseknc: identify dna replication origins by three-window-based pseknc, Bioinformatics, (2018)
[43] Liu, B.; Wu, H.; Chou, K. C., Pse-in-one 2.0: an improved package of web servers for generating various modes of pseudo components of dna, rna, and protein sequences, Nat. Sci. (Irvine), 09, 4, 67-91, (2017)
[44] Liu, B.; Yang, F.; Chou, K. C., 2L-pirna: a two-layer ensemble classifier for identifying piwi-interacting rnas and their function, Mol. Ther. Nucleic Acids, 7, C, 267-277, (2017)
[45] Liu, B.; Yang, F.; Huang, D. S.; Chou, K. C., Ipromoter-2l: a two-layer predictor for identifying promoters and their types by multi-window-based pseknc, Bioinformatics, 34, 1, 33-40, (2017)
[46] Liu, Z.; Xiao, X.; Qiu, W. R.; Chou, K. C., Idna-methyl: identifying dna methylation sites via pseudo trinucleotide composition, Anal. Biochem., 474, 69-77, (2015)
[47] Liu, Z.; Xiao, X.; Yu, D. J.; Jia, J.; Qiu, W. R.; Chou, K. C., Prnam-pc: predicting n(6)-methyladenosine sites in rna sequences via physical-chemical properties., Anal. Biochem., 497, 60-67, (2016)
[48] Maruf, M. A.A.; Shatabda, S., Irspot-sf: prediction of recombination hotspots by incorporating sequence based features into Chou’s pseudo components, Genomics, (2018)
[49] Meher, P. K.; Sahu, T. K.; Saini, V.; Rao, A. R., Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC, Sci. Rep., 7, 42362, (2017)
[50] Mei, J.; Ji, Z., Prediction of hiv-1 and hiv-2 proteins by using Chous pseudo amino acid compositions and different classifiers:, Sci. Rep., 8, (2018)
[51] Qian, Y.; Martell, J.; Pace, N. J.; Ballard, T. E.; Johnson, D. S.; Weerapana, E., An isotopically tagged azobenzene-based cleavable linker for quantitative proteomics, Chembiochem., 14, 12, 1410-1414, (2013)
[52] Qiu, W. R.; Jiang, S. Y.; Sun, B. Q.; Xiao, X.; Cheng, X.; Chou, K. C., Irna-2methyl: identify rna2′-o-methylation sites by incorporating sequence-coupled effects into general pseknc and ensemble classifier, Med. Chem. (Los Angeles), 13, 8, 734-743, (2017)
[53] Qiu, W. R.; Jiang, S. Y.; Xu, Z. C.; Xiao, X.; Chou, K. C., Irnam5c-psednc: identifying rna 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition, Oncotarget, 8, 25, 41178-41188, (2017)
[54] Qiu, W. R.; Sun, B. Q.; Xiao, X.; Xu, D.; Chou, K. C., Iphos-pseevo: identifying human phosphorylated proteins by incorporating evolutionary information into general PseAAC via grey system theory, Mol. Inf., 36, 5-6, (2017)
[55] Qiu, W. R.; Sun, B. Q.; Xiao, X.; Xu, Z. C.; Chou, K. C., Ihyd-psecp: identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC, Oncotarget, 7, 28, 44310-44321, (2016)
[56] Qiu, W. R.; Sun, B. Q.; Xiao, X.; Xu, Z. C.; Chou, K. C., Iptm-mlys: identifying multiple lysine ptm sites and their different types, Bioinformatics, 32, 20, 3116-3123, (2016)
[57] Qiu, W. R.; Sun, B. Q.; Xiao, X.; Xu, Z. C.; Jia, J. H.; Chou, K. C., Ikcr-pseens: identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier, Genomics, 110, 239-246, (2017)
[58] Qiu, W. R.; Xiao, X.; Chou, K. C., Irspot-tncPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components, Int. J. Mol. Sci., 15, 2, 1746-1766, (2014)
[59] Qiu, W. R.; Xiao, X.; Lin, W. Z.; Chou, K. C., Imethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach, Biomed. Res. Int., 2014, 12, 947416, (2014)
[60] Qiu, W. R.; Xiao, X.; Lin, W. Z.; Chou, K. C., Iubiq-lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model, J. Biomol. Struct. Dyn., 33, 8, 1731-1742, (2015)
[61] Qiu, W. R.; Xiao, X.; Xu, Z. C.; Chou, K. C., Iphos-pseen: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier, Oncotarget, 7, 32, 51270-51283, (2016)
[62] Quinlan, J. R., C4.5: Programs for machine learning, (1992), Morgan Kaufmann Publishers Inc
[63] Sakka, M.; Tzortzis, G.; Mantzaris, M. D.; Bekas, N.; Kellici, T. F.; Likas, A.; Galaris, D.; Gerothanassis, I. P.; Tzakos, A. G., Press: protein s-sulfenylation server, Bioinformatics, 32, 17, 2710-2712, (2016)
[64] Shi, H., Best-First Decision Tree Learning, (2007), University of Waikato
[65] Song, J.; Wang, Y.; Li, F.; Akutsu, T.; Rawlings, N. D.; Webb, G. I.; Chou, K. C., Iprot-sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites, Phys. Rev. E, 97, 4, (2018)
[66] Su, Z. D.; Huang, Y.; Zhang, Z. Y.; Zhao, Y. W.; Wang, D.; Chen, W.; Chou, K. C.; Lin, H., Iloc-lncrna: predict the subcellular location of lncrnas by incorporating octamer composition into general pseknc, Bioinformatics, (2018)
[67] Szychowski, J.; Mahdavi, A.; Hodas, J. J.L.; Bagert, J. D.; Ngo, J. T.; Landgraf, P.; Dieterich, D. C.; Schuman, E. M.; Tirrell, D. A., Cleavable biotin probes for labeling of biomolecules via azide - alkyne cycloaddition, J. Am. Chem. Soc., 132, 51, 18351-18360, (2010)
[68] Tahir, M.; Hayat, M.; Kabir, M., Sequence based predictor for discrimination of enhancer and their types by applying general form of Chou’s trinucleotide composition, Comput. Methods Programs Biomed., 146, July 2017, 69-75, (2017)
[69] Wang, C.; Weerapana, E.; Blewett, M. M.; Cravatt, B. F., A chemoproteomic platform to quantitatively map targets of lipid-derived electrophiles, Nat. Methods, 11, 1, 79-85, (2014)
[70] Wang, X.; Yan, R.; Li, J.; Song, J., Sohpred: a new bioinformatics tool for the characterization and prediction of human s-sulfenylation sites, Mol. Biosyst., 12, 9, 2849-2857, (2016)
[71] Weerapana, E.; Wang, C.; Simon, G. M.; Richter, F.; Khare, S.; Dillon, M. B.D.; Bachovchin, D. A.; Mowen, K.; Baker, D.; Cravatt, B. F., Quantitative reactivity profiling predicts functional cysteines in proteomes, Nature, 468, 7325, 790-795, (2010)
[72] Witten, I. H.; Frank, E.; Hall, M. A., Data mining: Practical machine learning tools and techniques, (2005), Morgan Kaufmann · Zbl 1076.68555
[73] Xiao, X.; Cheng, X.; Chen, G.; Mao, Q.; Chou, K. C., Ploc-mgpos: predict subcellular localization of gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC, Genomics, (2018)
[74] Xiao, X.; Ye, H. X.; Liu, Z.; Jia, J. H.; Chou, K. C., Iros-gpseknc: predicting replication origin sites in dna by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition, Oncotarget, 7, 23, 34180-34189, (2016)
[75] Xu, Y.; Chou, K. C., Recent progress in predicting posttranslational modification sites in proteins, Curr. Top. Med. Chem., 16, 6, 591-603, (2016)
[76] Xu, Y.; Ding, J.; Wu, L. Y., Isulf-cys: prediction of s-sulfenylation sites in proteins with physicochemical properties of amino acids, PloS One, 11, 4, e0154237, (2016)
[77] Xu, Y.; Ding, J.; Wu, L. Y.; Chou, K. C., Isno-PseAAC: predict cysteine s-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition, PloS One, 8, 2, e55844, (2013)
[78] Xu, Y.; Wang, Z.; Li, C.; Chou, K. C., Ipreny-PseAAC: identify c-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into PseAAC, Med. Chem. (Los Angeles), 13, 6, 544-551, (2017)
[79] Xu, Y.; Wen, X.; Shao, X. J.; Deng, N. Y.; Chou, K. C., Ihyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition, Int. J. Mol. Sci., 15, 5, 7594-7610, (2014)
[80] Xu, Y.; Wen, X.; Wen, L. S.; Wu, L. Y.; Deng, N. Y.; Chou, K. C., Initro-tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition., PLoS One, 9, 8, e105018, (2014)
[81] Xu, Y.; Yang, Y.; Ding, J.; Li, C., Iglu-lys: a predictor for lysine glutarylation through amino acid pair order features, IEEE Trans. Nanobioscience, PP, 99, (2018)
[82] Yang, H.; Qiu, W. R.; Liu, G.; Guo, F. B.; Chen, W.; Chou, K. C.; Lin, H., Irspot-pse6nc: identifying recombination spots insaccharomyces cerevisiaeby incorporating hexamer composition into general pseknc:, Int. J. Biol. Sci., 14, 8, 883-891, (2018)
[83] Yang, J.; Gupta, V.; Carroll, K. S.; Liebler, D. C., Site-specific mapping and quantification of protein s-sulfenylation in cells, Nat. Commun., 5, 4776, (2014)
[84] Yu, B.; Li, S.; Qiu, W. Y.; Chen, C.; Chen, R. X.; Wang, L.; Wang, M. H.; Zhang, Y., Accurate prediction of subcellular location of apoptosis proteins combining Chous PseAAC and psepssm based on wavelet denoising, Oncotarget, 8, 64, 107640-107665, (2017)
[85] Zheng, T.; Jiang, H.; Wu, P., Single-stranded dna as a cleavable linker for bioorthogonal click chemistry-based proteomics, Bioconjug. Chem., 24, 6, 859-864, (2013)
[86] Zhong, J.; Sun, Y.; Peng, W.; Xie, M.; Yang, J.; Tang, X., Xgbfemf: an xgboost-based framework for essential protein prediction, IEEE Trans. Nanobioscience, PP, 99, (2018)
[87] Zuo, Y.; Jia, C.; Li, T.; Chen, Y., Identification of cancerlectins by split bi-profile bayes feature extraction, Curr. Proteomics, 15, 196-200, (2018)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.