zbMATH — the first resource for mathematics

Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou’s PseKNC. (English) Zbl 1397.92232
Summary: This study examines accurate and efficient computational method for identification of 5-methylcytosine sites in RNA modification. The occurrence of 5-methylcytosine (m\(^5\)C) plays a vital role in a number of biological processes. For better comprehension of the biological functions and mechanism it is necessary to recognize m\(^5\)C sites in RNA precisely. The laboratory techniques and procedures are available to identify m\(^5\)C sites in RNA, but these procedures require a lot of time and resources. This study develops a new computational method for extracting the features of RNA sequence. In this method, first the RNA sequence is encoded via composite feature vector, then, for the selection of discriminate features, the minimum-redundancy-maximum-relevance algorithm was used. Secondly, the classification method used has been based on a support vector machine by using jackknife cross validation test. The suggested method efficiently identifies m\(^5\)C sites from non-m\(^5\)C sites and the outcome of the suggested algorithm is 93.33% with sensitivity of 90.0 and specificity of 96.66 on bench mark datasets. The result exhibits that proposed algorithm shown significant identification performance compared to the existing computational techniques. This study extends the knowledge about the occurrence sites of RNA modification which paves the way for better comprehension of the biological uses and mechanism.

92C40 Biochemistry, molecular biology
Full Text: DOI
[1] Agris, P. F., Bringing order to translation: the contributions of transfer RNA anticodon‐domain modifications, EMBO Rep., 9, 629-635, (2008)
[2] Ahmad, S.; Kabir, M.; Hayat, M., Identification of heat shock protein families and J-protein types by incorporating dipeptide composition into Chou’s general pseaac, Comput. Methods Programs Biomed., 122, 165-174, (2015)
[3] Bartenhagen, C.; Klein, H.-U.; Ruckert, C.; Jiang, X.; Dugas, M., Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data, BMC Bioinformatics, 11, 567, (2010)
[4] Cai, Y.-D.; Chou, K.-C., Predicting subcellular localization of proteins in a hybridization space, Bioinformatics, 20, 1151-1156, (2004)
[5] Cai, Y.-D.; Zhou, G.-P.; Chou, K.-C., Support vector machines for predicting membrane protein types by using functional domain composition, Biophys. J., 84, 3257-3263, (2003)
[6] Chen, W.; Feng, P.; Ding, H.; Lin, H.; Chou, K.-C., Irna-methyl: identifying N6-methyladenosine sites using pseudo nucleotide composition, Anal. Biochem., 490, 26-33, (2015)
[7] Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K.-C., Irna-AI: identifying the adenosine to inosine editing sites in RNA sequences, Oncotarget, 8, 4208, (2017)
[8] Chen, W.; Lei, T.-Y.; Jin, D.-C.; Lin, H.; Chou, K.-C., Pseknc: a flexible web server for generating pseudo K-tuple nucleotide composition, Anal. Biochem., 456, 53-60, (2014)
[9] Chen, W.; Lin, H.; Chou, K.-C., Pseudo nucleotide composition or pseknc: an effective formulation for analyzing genomic sequences, Mol. Biosyst., 11, 2620-2634, (2015)
[10] Chen, W.; Lin, H.; Feng, P.-M.; Ding, C.; Zuo, Y.-C.; Chou, K.-C., Inuc-physchem: a sequence-based predictor for identifying nucleosomes via physicochemical properties, PLoS One, 7, e47843, (2012)
[11] Chen, W.; Tang, H.; Ye, J.; Lin, H.; Chou, K.-C., Irna-pseu: identifying RNA pseudouridine sites, Mol. Ther. Nucleic Acids, 5, (2016)
[12] Cheng, X.; Xiao, X.; Chou, K.-C., Ploc-meuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general pseaac, Genomics, (2017)
[13] Cheng, X.; Xiao, X.; Chou, K.-C., Ploc-mplant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general pseaac, Mol. Biosyst., 13, 1722-1727, (2017)
[14] Cheng, X.; Xiao, X.; Chou, K.-C., Ploc-mvirus: predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general pseaac, Gene, 628, 315-321, (2017)
[15] Cheng, X.; Xiao, X.; Chou, K.-C., Ploc-mgneg: predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general pseaac, Genomics, (2017)
[16] Cheng, X.; Zhao, S.-G.; Lin, W.-Z.; Xiao, X.; Chou, K.-C., Ploc-manimal: predict subcellular localization of animal proteins with both single and multiple sites, Bioinformatics, 33, 3524-3531, (2017)
[17] Cheng, X.; Zhao, S.-G.; Xiao, X.; Chou, K.-C., Iatc-misf: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics, 33, 341-346, (2016)
[18] Cheng, X.; Zhao, S.-G.; Xiao, X.; Chou, K.-C., Iatc-mhyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals, Oncotarget, 8, 58494, (2017)
[19] Chou, K. C., Prediction of protein cellular attributes using pseudo‐amino acid composition, Proteins Struct. Funct. Bioinf., 43, 246-255, (2001)
[20] Chou, K.-C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247, (2011) · Zbl 1405.92212
[21] Chou, K.-C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 1092-1100, (2013)
[22] Chou, K.-C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 218-234, (2015)
[23] Chou, K.-C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top. Med. Chem., 17, 2337-2358, (2017)
[24] Chou, K.-C.; Shen, H.-B., Euk-mploc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites, J. Proteome Res., 6, 1728-1734, (2007)
[25] Chou, K.-C.; Shen, H.-B., Recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 1, 63, (2009)
[26] Chou, K.-C.; Zhang, C.-T., Prediction of protein structural classes, Crit. Rev. Biochem. Mol. Biol., 30, 275-349, (1995)
[27] Chow, C. S.; Lamichhane, T. N.; Mahto, S. K., Expanding the nucleotide repertoire of the ribosome with post-transcriptional modifications, ACS Chem. Biol., 2, 610-619, (2007)
[28] Denoeux, T., A k-nearest neighbor classification rule based on Dempster-Shafer theory, IEEE Trans. Syst., Man, Cybern., 25, 804-813, (1995)
[29] Edelheit, S.; Schwartz, S.; Mumbach, M. R.; Wurtzel, O.; Sorek, R., Transcriptome-wide mapping of 5-methylcytidine RNA modifications in bacteria, archaea, and yeast reveals m 5 C within archaeal mrnas, PLoS Genet, 9, (2013)
[30] Feng, P.; Chen, W.; Lin, H., Prediction of cpg island methylation status by integrating DNA physicochemical properties, Genomics, 104, 229-233, (2014)
[31] Feng, P.; Ding, H.; Chen, W.; Lin, H., Identifying RNA 5-methylcytosine sites via pseudo nucleotide compositions, Mol. Biosyst., 12, 3307-3311, (2016)
[32] Feng, P.; Ding, H.; Yang, H.; Chen, W.; Lin, H.; Chou, K.-C., Irna-psecoll: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into pseknc, Mol. Ther. Nucleic Acids, 7, 155-163, (2017)
[33] Feng, P.; Jiang, N.; Liu, N., Prediction of dnase I hypersensitive sites by using pseudo nucleotide compositions, Sci. World J., 2014, (2014)
[34] Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chen, W.; Chou, K.-C., Idna6ma-pseknc: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into pseknc, Genomics, (2018)
[35] Feng, P.-M.; Chen, W.; Lin, H.; Chou, K.-C., Ihsp-pseraaac: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition, Anal. Biochem., 442, 118-125, (2013)
[36] Fu, L.; Niu, B.; Zhu, Z.; Wu, S.; Li, W., CD-HIT: accelerated for clustering the next-generation sequencing data, Bioinformatics, 28, 3150-3152, (2012)
[37] Guo, S.-H.; Deng, E.-Z.; Xu, L.-Q.; Ding, H.; Lin, H.; Chen, W., Inuc-pseknc: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition, Bioinformatics, (2014), p. btu083
[38] Hayat, M.; Khan, A., Discriminating outer membrane proteins with fuzzy K-nearest neighbor algorithms based on the general form of Chou’s pseaac, Protein Pept. Lett., 19, 411-421, (2012)
[39] Hayat, M.; Tahir, M.; Khan, S. A., Prediction of protein structure classes using hybrid space of multi-profile Bayes and bi-Gram probability feature spaces, J. Theor. Biol., 346, 8-15, (2014)
[40] Helm, M., Post-transcriptional nucleotide modification and alternative folding of RNA, Nucleic. Acids. Res., 34, 721-733, (2006)
[41] Hussain, S.; Sajini, A. A.; Blanco, S.; Dietmann, S.; Lombard, P.; Sugimoto, Y., Nsun2-mediated cytosine-5 methylation of vault noncoding RNA determines its processing into regulatory small rnas, Cell Rep., 4, 255-261, (2013)
[42] Iqbal, M.; Hayat, M., “iss-hyb-mrmr”: identification of splicing sites using hybrid space of pseudo trinucleotide and pseudo tetranucleotide composition, Comput. Methods Programs Biomed., 128, 1-11, (2016)
[43] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., Ippi-esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into pseaac, J. Theor. Biol., 377, 47-56, (2015)
[44] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., Icar-psecp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general pseaac, Oncotarget, 7, 34558, (2016)
[45] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., Ippbs-opt: a sequence-based ensemble classifier for identifying protein-protein binding sites by optimizing imbalanced training datasets, Molecules, 21, 95, (2016)
[46] Kabir, M.; Iqbal, M.; Ahmad, S.; Hayat, M., Itis-pseknc: identification of translation initiation site in human genes using pseudo k-tuple nucleotides composition, Comput. Biol. Med., 66, 252-257, (2015)
[47] Kandaswamy, K. K.; Chou, K.-C.; Martinetz, T.; Möller, S.; Suganthan, P.; Sridharan, S., AFP-pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties, J. Theor. Biol., 270, 56-62, (2011)
[48] Khan, A.; Khan, M.; Choi, T.-S., Proximity based GPCRs prediction in transform domain, Biochem. Biophys. Res. Commun., 371, 411-415, (2008)
[49] Khan, M.; Hayat, M.; Khan, S. A.; Iqbal, N., Unb-DPC: identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou’s general pseaac, J. Theor. Biol., 415, 13-19, (2017)
[50] Khoddami, V.; Cairns, B. R., Identification of direct targets and modified bases of RNA cytosine methyltransferases, Nat. Biotechnol., 31, 458-464, (2013)
[51] Li, Z.-C.; Zhou, X.-B.; Dai, Z.; Zou, X.-Y., Prediction of protein structural classes by Chou’s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis, Amino Acids, 37, 415, (2009)
[52] Lin, S.-X.; Lapointe, J., Theoretical and experimental biology in one-a symposium in honour of Professor kuo-Chen Chou’s 50th anniversary and Professor richard giegé’s 40th anniversary of their scientific careers, J. Biomed. Sci. Eng., 6, 435, (2013)
[53] Lin, W.-Z.; Fang, J.-A.; Xiao, X.; Chou, K.-C., Idna-prot: identification of DNA binding proteins using random forest with grey model, PLoS One, 6, e24756, (2011)
[54] Liu, B.; Fang, L.; Liu, F.; Wang, X.; Chou, K.-C., Imirna-psedpc: microrna precursor identification with a pseudo distance-pair composition approach, J. Biomol. Struct. Dyn., 34, 223-235, (2016)
[55] Liu, B.; Liu, F.; Fang, L.; Wang, X.; Chou, K.-C., Repdna: a python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects, Bioinformatics, 31, 1307-1309, (2015)
[56] Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K.-C., Pse-in-one: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic. Acids. Res., 43, W65-W71, (2015)
[57] Liu, B.; Long, R.; Chou, K.-C., Idhs-EL: identifying dnase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework, Bioinformatics, 32, 2411-2418, (2016)
[58] Liu, B.; Wang, S.; Long, R.; Chou, K.-C., Irspot-EL: identify recombination spots with an ensemble learning approach, Bioinformatics, 33, 35-41, (2016)
[59] Liu, B.; Xu, J.; Lan, X.; Xu, R.; Zhou, J.; Wang, X., Idna-prot| dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition, PLoS One, 9, (2014)
[60] Liu, B.; Yang, F.; Chou, K.-C., 2L-pirna: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function, Mol. Ther. Nucleic Acids, 7, 267-277, (2017)
[61] Liu, L.-M.; Xu, Y.; Chou, K.-C., Ipgk-pseaac: identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general pseaac, Med. Chem., 13, 552-559, (2017)
[62] Liu, Z.; Xiao, X.; Qiu, W.-R.; Chou, K.-C., Idna-methyl: identifying DNA methylation sites via pseudo trinucleotide composition, Anal. Biochem., 474, 69-77, (2015)
[63] Liu, Z.; Xiao, X.; Yu, D.-J.; Jia, J.; Qiu, W.-R.; Chou, K.-C., Prnam-PC: predicting N6-methyladenosine sites in RNA sequences via physical-chemical properties, Anal. Biochem., 497, 60-67, (2016)
[64] Machnicka, M. A.; Milanowska, K.; Oglou, O. O.; Purta, E.; Kurkowska, M.; Olchowik, A., MODOMICS: a database of RNA modification pathways—2012 update, Nucleic. Acids. Res., (2012), p. gks1007
[65] Motorin, Y.; Helm, M., Trna stabilization by modified nucleotides, Biochemistry, 49, 4934-4944, (2010)
[66] Peng, H.; Long, F.; Ding, C., Feature selection based on mutual information criteria of MAX-dependency, MAX-relevance, and MIN-redundancy, IEEE Trans. Pattern Anal. Mach. Intell., 27, 1226-1238, (2005)
[67] Pengmian Feng, H. D.; Chen, Wei; Lin, Hao, Identifying RNA 5-methylcytosine sites via pseudo nucleotide compositions, Mol. Biosyst., (2016)
[68] Qiu, W.-R.; Jiang, S.-Y.; Sun, B.-Q.; Xiao, X.; Cheng, X.; Chou, K.-C., Irna-2methyl: identify RNA 2′-O-methylation sites by incorporating sequence-coupled effects into general pseknc and ensemble classifier, Med. Chem., 13, 734-743, (2017)
[69] Qiu, W.-R.; Jiang, S.-Y.; Xu, Z.-C.; Xiao, X.; Chou, K.-C., Irnam5C-psednc: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition, Oncotarget, 8, 41178, (2017)
[70] Qiu, W.-R.; Sun, B.-Q.; Xiao, X.; Xu, Z.-C.; Chou, K.-C., Iptm-mlys: identifying multiple lysine PTM sites and their different types, Bioinformatics, 32, 3116-3123, (2016)
[71] Qiu, W.-R.; Xiao, X.; Chou, K.-C., Irspot-tncpseaac: identify recombination spots with trinucleotide composition and pseudo amino acid components, Int. J. Mol. Sci., 15, 1746-1766, (2014)
[72] Song, J.; Li, F.; Takemoto, K.; Haffari, G.; Akutsu, T.; Chou, K.-C., Prevail, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework, J. Theor. Biol., 443, 125-137, (2018) · Zbl 06898995
[73] Squires, J. E.; Patel, H. R.; Nousch, M.; Sibbritt, T.; Humphreys, D. T.; Parker, B. J., Widespread occurrence of 5-methylcytosine in human coding and non-coding RNA, Nucleic. Acids. Res., (2012), p. gks144
[74] Sun, W.-J.; Li, J.-H.; Liu, S.; Wu, J.; Zhou, H.; Qu, L.-H., Rmbase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data, Nucleic. Acids. Res., (2015), p. gkv1036
[75] Thompson, T. B.; Chou, K.-C.; Zheng, C., Neural network prediction of the HIV-1 protease cleavage sites, J. Theor. Biol., 177, 369-379, (1995)
[76] Wang, M.; Yang, J.; Xu, Z.-J.; Chou, K.-C., SLLE for predicting membrane protein types, J. Theor. Biol., 232, 7-15, (2005)
[77] Xiao, X.; Cheng, X.; Su, S.; Mao, Q.; Chou, K.-C., Ploc-mgpos: incorporate key gene ontology information into general pseaac for predicting subcellular localization of Gram-positive bacterial proteins, Nat. Sci., 9, 330, (2017)
[78] Xiao, X.; Min, J.-L.; Wang, P.; Chou, K.-C., Icdi-psefpt: identify the channel-drug interaction in cellular networking with pseaac and molecular fingerprints, J. Theor. Biol., 337, 71-79, (2013)
[79] Xiao, X.; Wang, P.; Chou, K.-C., Inr-physchem: a sequence-based predictor for identifying nuclear receptors and their subfamilies via physical-chemical property matrix, PLoS One, 7, e30869, (2012)
[80] Xiao, X.; Wang, P.; Lin, W.-Z.; Jia, J.-H.; Chou, K.-C., Iamp-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types, Anal. Biochem., 436, 168-177, (2013)
[81] Xu, Y.; Ding, J.; Wu, L.-Y.; Chou, K.-C., Isno-pseaac: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition, PLoS One, 8, e55844, (2013)
[82] Xu, Y.; Wang, Z.; Li, C.; Chou, K.-C., Ipreny-pseaac: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into pseaac, Med. Chem., 13, 544-551, (2017)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.