zbMATH — the first resource for mathematics

NucPosPred: predicting species-specific genomic nucleosome positioning via four different modes of general PseKNC. (English) Zbl 1397.92010
Summary: The nucleosome is the basic structure of chromatin in eukaryotic cells, with essential roles in the regulation of many biological processes, such as DNA transcription, replication and repair, and RNA splicing. Because of the importance of nucleosomes, the factors that determine their positioning within genomes should be investigated. High-resolution nucleosome-positioning maps are now available for organisms including Saccharomyces cerevisiae, Drosophila melanogaster and Caenorhabditis elegans, enabling the identification of nucleosome positioning by application of computational tools. Here, we describe a novel predictor called NucPosPred, which was specifically designed for large-scale identification of nucleosome positioning in C. elegans and D. melanogaster genomes. NucPosPred was separately optimized for each species for four types of DNA sequence feature extraction, with consideration of two classification algorithms (gradient-boosting decision tree and support vector machine). The overall accuracy obtained with NucPosPred was 92.29% for C. elegans and 88.26% for D. melanogaster, outperforming previous methods and demonstrating the potential for species-specific prediction of nucleosome positioning. For the convenience of most experimental scientists, a web-server for the predictor NucPosPred is available at

92-08 Computational methods for problems pertaining to biology
92C40 Biochemistry, molecular biology
Full Text: DOI
[1] Afridi, T. H.; Khan, A.; Lee, Y. S., Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition, Amino Acids, 42, 1443-1454, (2012)
[2] Arif, M.; Hayat, M.; Jan, Z., Imem-2LSAAC: a two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into Chou’s pseudo amino acid composition, J. Theor. Biol., 442, 11-21, (2018) · Zbl 1397.92180
[3] Awazu, A., Prediction of nucleosome positioning by the incorporation of frequencies and distributions of three different nucleotide segment lengths into a general pseudo k-tuple nucleotide composition, Bioinformatics, 33, 42-48, (2017)
[4] Behbahani, M.; Mohabatkar, H.; Nosrati, M., Analysis and comparison of lignin peroxidases between fungi and bacteria using three different modes of Chou’s general pseudo amino acid composition, J. Theor. Biol., 411, 1-5, (2016)
[5] Bernstein, B. E.; Liu, C. L.; Humphrey, E. L.; Perlstein, E. O.; Schreiber, S. L., Global nucleosome occupancy in yeast, Genome Biol., 5, R62, (2004)
[6] Breiman, L., Random forests, Mach. Learn., 45, 5-32, (2001) · Zbl 1007.68152
[7] Cai, R.; Qian, D.; Wang, D.; Zhu, P.; Science, S. O.; University, J., E-gene signature method with biological and physical characteristics—case in p53 gene family, Comput. Eng. Appl, (2017)
[8] Chang, C. C.; Lin, C. J., LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol., 2, 1-27, (2011)
[9] Chen, W.; Luo, L. F.; Zhang, L. R., The organization of nucleosomes around splice sites, Nucleic Acids Res., 38, 2788-2798, (2010)
[10] Chen, W.; Tang, H.; Lin, H., Methyrna: a web server for identification of N-6-methyladenosine sites, J. Biomol. Struct. Dyn., 35, 683-687, (2017)
[11] Chen, W.; Feng, P. M.; Lin, H.; Chou, K. C., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, e68, (2013)
[12] Chen, W.; Lei, T. Y.; Jin, D. C.; Lin, H.; Chou, K. C., Pseknc: a flexible web server for generating pseudo K-tuple nucleotide composition, Anal. Biochem., 456, 53-60, (2014)
[13] Chen, W.; Feng, P. M.; Ding, H.; Lin, H.; Chou, K. C., Using deformation energy to analyze nucleosome positioning in genomes, Genomics, 107, 69-75, (2016)
[14] Chen, W.; Lin, H.; Feng, P. M.; Ding, C.; Zuo, Y. C.; Chou, K. C., Inuc-physchem: a sequence-based predictor for identifying nucleosomes via physicochemical properties, Plos One, 7, (2012), doi:ARTN e4784310.1371/journal.pone.0047843
[15] Chen, W.; Feng, P. M.; Yang, H.; Ding, H.; Lin, H.; Chou, K. C., Irna-AI: identifying the adenosine to inosine editing sites in RNA sequences, Oncotarget, 8, 4208-4217, (2017)
[16] Chen, X.; Qiu, J. D.; Shi, S. P.; Suo, S. B.; Huang, S. Y.; Liang, R. P., Incorporating key position and amino acid residue features to identify general and species-specific ubiquitin conjugation sites, Bioinformatics, 29, 1614-1622, (2013)
[17] Cheng, X.; Xiao, X.; Chou, K. C., Ploc-meuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general pseaac, Genomics, 110, 50, (2017)
[18] Cheng, X.; Zhao, S. G.; Xiao, X.; Chou, K. C., Iatc-misf: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics, 33, 341-346, (2016)
[19] Cheng, X.; Zhao, S. G.; Xiao, X.; Chou, K. C., Iatc-mhyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals, Oncotarget, 8, 58494, (2017)
[20] Cheng, X.; Zhao, S. G.; Lin, W. Z.; Xiao, X.; Chou, K. C., Ploc-manimal: predict subcellular localization of animal proteins with both single and multiple sites, Bioinformatics, 33, 3524, (2017)
[21] Chodavarapu, R. K.; Feng, S.; Bernatavichute, Y. V.; Chen, P. Y.; Stroud, H.; Yu, Y.; Hetzel, J.; Kuo, F.; Jin, K.; Cokus, S. J., Relationship between nucleosome positioning and DNA methylation, Nature, 466, 388, (2010)
[22] Chou, K.; Chen, Prediction of protein cellular attributes using pseudo-amino acid composition, Proteinsstruct. Funct. Bioinf., 44, 246-255, (2001)
[23] Chou, K. C., Prediction of signal peptides using scaled window, Peptides, 22, 1973-1979, (2001)
[24] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, (2005), Oxford University Press
[25] Chou, K. C., Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology, Curr. Proteomics, 6, (2009)
[26] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247, (2011) · Zbl 1405.92212
[27] Chou, K. C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 1092-1100, (2013)
[28] Chou, K. C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, (2015)
[29] Chou, K. C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top. Med. Chem., 17, 2337-2358, (2017)
[30] Dehzangi, A.; Heffernan, R.; Sharma, A.; Lyons, J.; Paliwal, K.; Sattar, A., Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into chou׳s general pseaac, J. Theor. Biol., 364, 284, (2015) · Zbl 1405.92092
[31] Ehsan, A.; Mahmood, K.; Khan, Y. D.; Khan, S. A.; Chou, K. C., A novel modeling in mathematical biology for classification of signal peptides, Sci. Rep., 8, (2018)
[32] Feng, P.; Ding, H.; Yang, H.; Chen, W.; Lin, H.; Chou, K. C., Irna-psecoll: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into pseknc, Mol. Ther. Nucleic Acids, 7, 155-163, (2017)
[33] Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chen, W.; Chou, K. C., Idna6ma-pseknc: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into pseknc, Genomics, (2018)
[34] Feng, P. M.; Chen, W.; Lin, H.; Chou, K. C., Ihsp-pseraaac: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition, Anal, Biochem., 442, 118-125, (2013)
[35] Friedman, J. H., Greedy function approximation: a gradient boosting machine, Ann. Stat., 29, 1189-1232, (2001) · Zbl 1043.62034
[36] Gao, J. J.; Thelen, J. J.; Dunker, A. K.; Xu, D., Musite, a tool for global prediction of general and kinase-specific phosphorylation sites, Mol. Cell. Proteomics, 9, 2586-2600, (2010)
[37] Guo, S. H.; Deng, E. Z.; Xu, L. Q.; Ding, H.; Lin, H.; Chen, W.; Chou, K. C., Inuc-pseknc: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition, Bioinformatics, 30, 1522-1529, (2014)
[38] Gupta, S.; Dennis, J.; Thurman, R. E.; Kingston, R.; Stamatoyannopoulos, J. A.; Noble, W. S., Predicting human nucleosome occupancy from primary sequence, PLoS Comput. Biol., 4, (2008), doi:ARTN e1000134 10.1371/journal.pcbi.1000134
[39] Hayat, M.; Khan, A., Memhyb: predicting membrane protein types by hybridizing SAAC and PSSM, J. Theor. Biol., 292, 93-102, (2012) · Zbl 1307.92308
[40] He, W. Y.; Jia, C. Z., Enhancerpred2.0: predicting enhancers and their strength based on position-specific trinucleotide propensity and electron-ion interaction potential feature selection, Mol. Biosyst., 13, 767-774, (2017)
[41] Ioshikhes, I.; Bolshoy, A.; Derenshteyn, K.; Borodovsky, M.; Trifonov, E. N., Nucleosome DNA sequence pattern revealed by multiple alignment of experimentally mapped sequences, J. Mol. Biol,, 262, 129-139, (1996)
[42] Ji, G.; Yang, Z.; You, W., PLS-based gene selection and identification of tumor-specific genes, IEEE Trans. Syst. Man Cybern. Part C, 41, 830-841, (2011)
[43] Jia, C.; Liu, T.; Chang, A. K.; Zhai, Y., Prediction of mitochondrial proteins of malaria parasite using bi-profile Bayes feature extraction, Biochimie, 93, 778-782, (2011)
[44] Jia, C. Z.; Liu, T.; Wang, Z. P., O-glcnacpred: a sensitive predictor to capture protein O-glcnacylation sites, Mol. Biosyst., 9, 2909-2913, (2013)
[45] Jia, C. Z.; Zhang, J. J.; Gu, W. Z., RNA-methylpred: a high-accuracy predictor to identify N6-methyladenosine in RNA, Anal. Biochem., 510, 72-75, (2016)
[46] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K. C., Isuc-pseopt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset, Anal. Biochem., 497, 48-56, (2015)
[47] Jia, J.; Zhang, L.; Liu, Z.; Xiao, X.; Chou, K. C., Psumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general pseaac, Bioinformatics, 32, 3133-3141, (2016)
[48] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K. C., Psuc-lys: predict lysine succinylation sites in proteins with pseaac and ensemble random forest approach, J. Theor. Biol., 394, 223-230, (2016) · Zbl 1343.92153
[49] Kaplan, N.; Moore, I. K.; Fondufe-Mittendorf, Y.; Gossett, A. J.; Tillo, D.; Field, Y.; LeProust, E. M.; Hughes, T. R.; Lieb, J. D.; Widom, J.; Segal, E., The DNA-encoded nucleosome organization of a eukaryotic genome, Nature, 458, 362-366, (2009)
[50] Lee, W.; Tillo, D.; Bray, N.; Morse, R. H.; Davis, R. W.; Hughes, T. R.; Nislow, C., A high- resolution atlas of nucleosome occupancy in yeast, Nat. Genet., 39, 1235-1244, (2007)
[51] Liao, Z. J.; Huang, Y.; Yue, X. D.; Lu, H. J.; Xuan, P.; Ju, Y., In silico prediction of gamma-aminobutyric acid type-a receptors using novel machine-learning-based SVM and GBDT approaches, BioMed Res. Int., (2016), doi:Artn 2375268 10.1155/2016/2375268
[52] Liu, B.; Yang, F.; Chou, K. C., 2L-pirna: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function, Mol. Ther. Nucleic Acids, 7, 267-277, (2017)
[53] Liu, B.; Wu, H.; Chou, K. C., Pse-in-one 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nat. Sci., 09, 67-91, (2017)
[54] Liu, B.; Wang, S.; Long, R.; Chou, K. C., Irspot-EL: identify recombination spots with an ensemble learning approach, Bioinformatics, 33, 35-41, (2017)
[55] Liu, B.; Yang, F.; Huang, D. S.; Chou, K. C., Ipromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based pseknc, Bioinformatics, 34, 33-40, (2018)
[56] Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K. C., Pse-in-one: A web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res., 43, W65-W71, (2015)
[57] Liu, Z.; Xiao, X.; Qiu, W. R.; Chou, K. C., Idna-methyl: identifying DNA methylation sites via pseudo trinucleotide composition, Anal. Biochem., 474, 69-77, (2015)
[58] Liu, Z.; Xiao, X.; Yu, D. J.; Jia, J. H.; Qiu, W. R.; Chou, K. C., Prnam-PC: predicting N-6-methyladenosine sites in RNA sequences via physical-chemical properties, Anal. Biochem., 497, 60-67, (2016)
[59] Mavrich, T. N.; Ioshikhes, I. P.; Venters, B. J.; Jiang, C.; Tomsho, L. P.; Qi, J.; Schuster, S. C.; Albert, I.; Pugh, B. F., A barrier nucleosome model for statistical positioning of nucleosomes throughout the yeast genome, Genome Res., 18, 1073-1083, (2008)
[60] Mavrich, T. N.; Jiang, C. Z.; Ioshikhes, I. P.; Li, X. Y.; Venters, B. J.; Zanton, S. J.; Tomsho, L. P.; Qi, J.; Glaser, R. L.; Schuster, S. C.; Gilmour, D. S.; Albert, I.; Pugh, B. F., Nucleosome organization in the drosophila genome, Nature, 453, 358, (2008), -U27
[61] Meher, P. K.; Sahu, T. K.; Saini, V.; Rao, A. R., Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general pseaac, Sci. Rep., 7, 42362, (2017)
[62] Nair, A. S.; Sreenadhan, S. P., A coding measure scheme employing electron-ion interaction pseudopotential (EIIP), Bioinformation, 1, 197-202, (2006)
[63] Peckham, H. E.; Thurman, R. E.; Fu, Y. T.; Stamatoyannopoulos, J. A.; Noble, W. S.; Struhl, K.; Weng, Z. P., Nucleosome positioning signals in genomic DNA, Genome Res., 17, 1170-1177, (2007)
[64] Qiu, W.-R.; Jiang, S.-Y.; Sun, B.-Q.; Xiao, X.; Cheng, X.; Chou, K.-C., Irna-2methyl: identify RNA 2′-O-methylation sites by incorporating sequence-coupled effects into general pseknc and ensemble classifier, Med. Chem. (Sharjah (United Arab Emirates)), (2017)
[65] Qiu, W. R.; Xiao, X.; Chou, K. C., Irspot-tncpseaac: identify recombination spots with trinucleotide composition and pseudo amino acid components, Int. J. Mol. Sci., 15, 1746-1766, (2014)
[66] Qiu, W. R.; Jiang, S. Y.; Xu, Z. C.; Xiao, X.; Chou, K. C., Irnam5C-psednc: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition, Oncotarget, 8, 41178-41188, (2017)
[67] Qiu, W. R.; Sun, B. Q.; Xiao, X.; Xu, Z. C.; Jia, J. H.; Chou, K. C., Ikcr-pseens: dentify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier, Genomics, (2017)
[68] Rao, K. D.; Swarny, M. N.S., Analysis of genomics and proteomics using DSP techniques, IEEE Trans. Circuits Syst. IRegul. Pap., 55, 358-366, (2008)
[69] Richmond, T. J.; Davey, C. A., The structure of DNA in the nucleosome core, Nature, 423, 145-150, (2003)
[70] Rish, I., An empirical study of the naive Bayes classifier, J. Univers. Comput. Sci., 1, 127, (2001)
[71] Sakhnovich, A., On the GBDT version of the Bäcklund-Darboux transformation and its applications to linear and nonlinear equations and Weyl theory, Math. Modell. Nat. Phenom., 5, 340-389, (2010) · Zbl 1200.37070
[72] Schones, D. E.; Cui, K. R.; Cuddapah, S.; Roh, T. Y.; Barski, A.; Wang, Z. B.; Wei, G.; Zhao, K. J., Dynamic regulation of nucleosome positioning in the human genome, Cell, 132, 887-898, (2008)
[73] Schonlau, M., Boosted regression (boosting): an introductory tutorial and a stata plugin, Stata J., 5, 330-354, (2005)
[74] Segal, E.; Fondufe-Mittendorf, Y.; Chen, L. Y.; Thastrom, A.; Field, Y.; Moore, I. K.; Wang, J. P.Z.; Widom, J., A genomic code for nucleosome positioning, Nature, 442, 772-778, (2006)
[75] Semanjski, I.; Gautama, S., Smart city mobility application-gradient boosting trees for mobility prediction and analysis based on crowdsourced data, Sensors, 15, 15974-15987, (2015)
[76] Song, J.; Li, F.; Takemoto, K.; Haffari, G.; Akutsu, T.; Chou, K. C.; Webb, G. I., Prevail, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework, J. Theor. Biol., 443, 125-137, (2018) · Zbl 06898995
[77] Tahir, M.; Hayat, M., Inuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou’s pseaac, Mol. Biosyst., 12, 2587-2593, (2016)
[78] Valouev, A.; Ichikawa, J.; Tonthat, T.; Stuart, J.; Ranade, S.; Peckham, H.; Zeng, K.; Malek, J. A.; Costa, G.; McKernan, K.; Sidow, A.; Fire, A.; Johnson, S. M., A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning, Genome Res., 18, 1051-1063, (2008)
[79] Wei, C.; Hao, L.; Chou, K. C., Pseudo nucleotide composition or pseknc: an effective formulation for analyzing genomic sequences, Mol. Biosyst., 11, 2620-2634, (2015)
[80] Xiang, C.; Xuan, X.; Chou, K. C., Ploc-mvirus: predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general pseaac, Gene, 13, (2017)
[81] Xiao, X.; Cheng, X.; Su, S.; Mao, Q.; Chou, K. C., Ploc-mgpos: incorporate key gene ontology information into general pseaac for predicting subcellular localization of Gram-positive bacterial proteins, Nat. Sci., 09, 330-349, (2017)
[82] Xing, Y. Q.; Zhao, X. J.; Cai, L., Prediction of nucleosome occupancy in saccharomyces cerevisiae using position-correlation scoring function, Genomics, 98, 359-366, (2011)
[83] Xing, Y. Q.; Liu, G. Q.; Zhao, X. J.; Cai, L., An analysis and prediction of nucleosome positioning based on information content, Chromosome Res., 21, 63-74, (2013)
[84] Xu, Y.; Shao, X. J.; Wu, L. Y.; Deng, N. Y.; Kuo-Chen, C., Isno-aapair: incorporating amino acid pairwise coupling into pseaac for predicting cysteines-nitrosylation sites in proteins, PeerJ, 1, e171, (2013)
[85] Ye, J.; Chow, J. H.; Chen, J.; Zheng, Z., Stochastic gradient boosted distributed decision trees, (ACM Conference on Information & Knowledge Management, (2009)), 2061-2064
[86] Yu-Dong, C.; Kuo-Chen, C., Predicting subcellular localization of proteins in a hybridization space, Bioinformatics, 20, 1151, (2004)
[87] Zhang, C. J.; Tang, H.; Li, W. C.; Lin, H.; Chen, W.; Chou, K. C., Iori-human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition, Oncotarget, 7, 69783-69793, (2016)
[88] Zhang, Y.; Shin, H.; Song, J. S.; Lei, Y.; Liu, X. S., Identifying positioned nucleosomes with epigenetic marks in human from chip-seq, BMC Genomics, 9, 537, (2008)
[89] Zhang, Z. Q.; Zhang, Y. S.; Gutman, I., Predicting nucleosome positions in yeast: using the absolute frequency, J. Biomol. Struct. Dyn., 29, 1081-1088, (2012)
[90] Zhang, Z. Q.; Zhang, Y. S.; Chen, W.; Gutman, I.; Li, Y. C., Prediction of nucleosome positioning using the dinucleotide absolute frequency of DNA fragment, Match-Commun. Math. Comput. Chem., 68, 639-650, (2012)
[91] Zhao, X.; Pei, Z.; Liu, J.; Qin, S.; Cai, L., Prediction of nucleosome DNA formation potential and nucleosome positioning using increment of diversity combined with quadratic discriminant analysis, Chromosome Res., 18, 777-785, (2010)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.