zbMATH — the first resource for mathematics

iMethyl-STTNC: identification of N\(^6\)-methyladenosine sites by extending the idea of SAAC into Chou’s PseAAC to formulate RNA sequences. (English) Zbl 1406.92448
Summary: N\(^6\)-methyladenosine (m\(^6\)A) is a vital post-transcriptional modification, which adds another layer of epigenetic regulation at RNA level. It chemically modifies mRNA that effects protein expression. RNA sequence contains many genetic code motifs (GAC). Among these codes, identification of methylated or not methylated GAC motif is highly indispensable. However, with a large number of RNA sequences generated in post-genomic era, it becomes a challenging task how to accurately and speedily characterize these sequences. In view of this, the concept of an intelligent is incorporated with a computational model that truly and fast reflects the motif of the desired classes. An intelligent computational model “iMethyl-STTNC” model is proposed for identification of methyladenosine sites in RNA. In the proposed study, four feature extraction techniques, such as; pseudo-dinucleotide-composition, pseudo-trinucleotide-composition, split-trinucleotide-composition, and split-tetra-nucleotides-composition (STTNC) are utilized for genuine numerical descriptors. Three different classification algorithms including probabilistic neural network, support vector machine (SVM), and K-nearest neighbor are adopted for prediction. After examining the outcomes of prediction model on each feature spaces, SVM using STTNC feature space reported the highest accuracy of 69.84%, 91.84% on dataset1 and dataset2, respectively. The reported results show that our proposed predictor has achieved encouraging results compared to the present approaches, so far in the research. It is finally reckoned that our developed model might be beneficial for in-depth analysis of genomes and drug development.

92D20 Protein sequences, DNA sequences
92C40 Biochemistry, molecular biology
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI
[1] Adams, J. M.; Cory, S., Modified nucleosides and bizarre 5′-termini in mouse myeloma mrna, Nature, 255, 5503, 28-33, (1975)
[2] Afridi, T. H.; Khan, A.; Lee, Y. S., Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition, Amino Acids, 42, 4, 1443-1454, (2012)
[3] Ahmad, S.; Kabir, M.; Hayat, M., Identification of heat shock protein families and J-protein types by incorporating dipeptide composition into Chou’s general pseaac, Comput. Methods Programs Biomed., 122, 2, 165-174, (2015)
[4] Ahmad, K.; Waris, M.; Hayat, M., Prediction of protein submitochondrial locations by incorporating dipeptide composition into Chou’s general pseudo amino acid composition, J. Membr. Biol., 1-12, (2016)
[5] Akbar, S.; Ahmad, A.; Hayat, M., Identification of fingerprint using discrete wavelet transform in conjunction with support vector machine, (2014)
[6] Akbar, S., Iacp-gaensc: evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space, Artif. Intell. Med., 79, 62-70, (2017)
[7] Alashwal, H.; Deris, S.; Othman, R. M., Comparison of domain and hydrophobicity features for the prediction of protein-protein interactions using support vector machines, Int. J. Inf. Technol., 3, 1, 18-24, (2007)
[8] Ali, F.; Hayat, M., Classification of membrane protein types using voting feature interval in combination with chou׳s pseudo amino acid composition, J. Theor. Biol., 384, 78-83, (2015) · Zbl 1343.92006
[9] Aloni, Y.; Dhar, R.; Khoury, G., Methylation of nuclear Simian virus 40 rnas, J. Virol., 32, 1, 52-60, (1979)
[10] Arif, M.; Hayat, M.; Jan, Z., Imem-2LSAAC: a two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into Chou’s pseudo amino acid composition, J. Theor. Biol, (2018) · Zbl 1397.92180
[11] Beemon, K.; Keith, J., Localization of N 6-methyladenosine in the rous sarcoma virus genome, J. Mol. Biol., 113, 1, 165-179, (1977)
[12] Bodi, Z., Yeast targets for mrna methylation, Nucleic Acids Res., gkq266, (2010)
[13] Chen, W., Inuc-physchem: a sequence-based predictor for identifying nucleosomes via physicochemical properties, PLoS One, 7, 10, e47843, (2012)
[14] Chen, W., Pseknc: a flexible web server for generating pseudo K-tuple nucleotide composition, Anal. Biochem., 456, 53-60, (2014)
[15] Chen, W., Irna-methyl: identifying N 6-methyladenosine sites using pseudo nucleotide composition, Anal. Biochem., 490, 26-33, (2015)
[16] Chen, W., Irna-pseu: identifying RNA pseudouridine sites, Mol. Ther. Nucleic Acids, 5, (2016)
[17] Chen, W.; Tang, H.; Lin, H., Methyrna: a web server for identification of N6-methyladenosine sites, J. Biomol. Struct. Dyn., 35, 3, 683-687, (2017)
[18] Cheng, X.; Xiao, X.; Chou, K.-C, Ploc-mplant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general pseaac, Mol. Biosyst., 13, 9, 1722-1727, (2017)
[19] Cheng, X.; Xiao, X.; Chou, K.-C., Ploc-meuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general pseaac, Genomics, (2017)
[20] Cheng, X.; Xiao, X.; Chou, K.-C., Ploc-mhum: predict subcellular localization of multi-location human proteins via general pseaac to winnow out the crucial GO information, Bioinformatics, 1, 9, (2017)
[21] Chou, K.-C.; Shen, H.-B., Cell-ploc: a package of web servers for predicting subcellular localization of proteins in various organisms, Nat. Protoc., 3, 2, 153, (2008)
[22] Chou, K.-C.; Shen, H.-B., Recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 1, 02, 63, (2009)
[23] Chou, K.-C.; Wu, Z.-C.; Xiao, X., Iloc-euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins, PLoS One, 6, 3, e18258, (2011)
[24] Chou, K.-C.; Wu, Z.-C.; Xiao, X., Iloc-hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites, Mol. Biosyst., 8, 2, 629-641, (2012)
[25] Chou, K. C., Prediction of protein cellular attributes using pseudo‐amino acid composition, Proteins Struct. Funct. Bioinf., 43, 3, 246-255, (2001)
[26] Chou, K.-C., Using subsite coupling to predict signal peptides, Protein Eng., 14, 2, 75-79, (2001)
[27] Chou, K.-C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 1, 10-19, (2004)
[28] Chou, K.-C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 1, 236-247, (2011) · Zbl 1405.92212
[29] Chou, K.-C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 3, 218-234, (2015)
[30] Chou, K.-C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top. Med. Chem., 17, 21, 2337-2358, (2017)
[31] Clancy, M. J., Induction of sporulation in saccharomyces cerevisiae leads to the formation of N6‐methyladenosine in mrna: a potential mechanism for the activity of the IME4 gene, Nucleic Acids Res., 30, 20, 4509-4518, (2002)
[32] Cortes, C.; Vapnik, V., Support-vector networks, Mach. Learn., 20, 3, 273-297, (1995) · Zbl 0831.68098
[33] Desrosiers, R.; Friderici, K.; Rottman, F., Identification of methylated nucleosides in messenger RNA from novikoff hepatoma cells, Proc. Natl. Acad. Sci., 71, 10, 3971-3975, (1974)
[34] Ding, H., Ictx-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels, BioMed. Res. Int., 2014, (2014)
[35] Dominissini, D., Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq, Nature, 485, 7397, 201-206, (2012)
[36] Du, P.; Yu, Y., Submito-PSPCP: predicting protein submitochondrial locations by hybridizing positional specific physicochemical properties with pseudoamino acid compositions, BioMed. Res. Int., 2013, (2013)
[37] Du, P.; Gu, S.; Jiao, Y., Pseaac-general: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets, Int. J. Mol. Sci., 15, 3, 3495-3506, (2014)
[38] Feng, P., Irna-psecoll: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into pseknc, Mol. Ther. Nucleic Acids, 7, 155-163, (2017)
[39] Feng, P., Idna6ma-pseknc: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into pseknc, Genomics, (2018)
[40] Ghaderzadeh, M.; Rebecca, F.; Standring, A., Comparing performance of different neural networks for early detection of cancer from benign hyperplasia of prostate, Appl. Med. Inf., 33, 3, 45-54, (2013)
[41] Guo, S.-H., Inuc-pseknc: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition, Bioinformatics, btu083, (2014)
[42] Hayat, M.; Iqbal, N., Discriminating protein structure classes by incorporating pseudo average chemical shift to Chou’s general pseaac and support vector machine, Comput. Methods Programs Biomed., 116, 3, 184-192, (2014)
[43] Hayat, M.; Khan, A., Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition, J. Theor. Biol., 271, 1, 10-17, (2011) · Zbl 1405.92217
[44] Hayat, M.; Khan, A., Memhyb: predicting membrane protein types by hybridizing SAAC and PSSM, J. Theor. Biol., 292, 93-102, (2012) · Zbl 1307.92308
[45] Hayat, M.; Tahir, M., Psofuzzysvm-TMH: identification of transmembrane helix segments using ensemble feature space by incorporated fuzzy support vector machine, Mol. Biosyst., 11, 8, 2255-2262, (2015)
[46] Hayat, M.; Khan, A.; Yeasin, M., Prediction of membrane proteins using split amino acid and ensemble classification, Amino Acids, 42, 6, 2447-2460, (2012)
[47] Hayat, M.; Khan, A., Discriminating outer membrane proteins with fuzzy K-nearest neighbor algorithms based on the general form of Chou’s pseaac, Protein Pept. Lett., 19, 4, 411-421, (2012)
[48] He, X., Targetfreeze: identifying antifreeze proteins via a combination of weights using sequence evolutionary information and pseudo amino acid composition, J. Membr. Biol., 248, 6, 1005-1014, (2015)
[49] Hwang, W.-J.; Wen, K.-W., Fast knn classification algorithm based on partial distance search, Electron. Lett., 34, 21, 2062-2063, (1998)
[50] Iqbal, M.; Hayat, M., Iss-hyb-mrmr: identification of splicing sites using hybrid space of pseudo trinucleotide and pseudo tetranucleotide composition, Comput. Methods Programs Biomed., 128, 1-11, (2016)
[51] Jia, G., N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTO, Nat. Chem. Biol., 7, 12, 885-887, (2011)
[52] Jia, C.; Lin, X.; Wang, Z., Prediction of protein S-nitrosylation sites based on adapted normal distribution bi-profile Bayes and Chou’s pseudo amino acid composition, Int. J. Mol. Sci., 15, 6, 10410-10423, (2014)
[53] Jia, J., Ippi-esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into pseaac, J. Theor. Biol., 377, 47-56, (2015)
[54] Jia, J., Isuc-pseopt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset, Anal. Biochem., 497, 48-56, (2016)
[55] Ju, Z.; Wang, S.-Y., Prediction of citrullination sites by incorporating k-spaced amino acid pairs into Chou’s general pseudo amino acid composition, Gene, 664, 78-83, (2018)
[56] Keller, J. M.; Gray, M. R.; Givens, J. A., A fuzzy k-nearest neighbor algorithm, Syst. Man Cybern. IEEE Trans., 4, 580-585, (1985)
[57] Khan, A.; Khan, M.; Choi, T.-S., Proximity based GPCRs prediction in transform domain, Biochem. Biophys. Res. Commun., 371, 3, 411-415, (2008)
[58] Khan, A., Machine learning based adaptive watermark decoding in view of anticipated attack, Pattern Recognit., 41, 8, 2594-2610, (2008) · Zbl 1151.68585
[59] Khan, Z. U.; Hayat, M.; Khan, M. A., Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model, J. Theor. Biol., 365, 197-203, (2015) · Zbl 1314.92069
[60] Levis, R.; Penman, S., 5′-terminal structures of poly (A)+ cytoplasmic messenger RNA and of poly (A)+ and poly (A)− heterogeneous nuclear RNA of cells of the dipteran drosophila melanogaster, J. Mol. Biol., 120, 4, 487-515, (1978)
[61] Li, W.-C., Iori-pseknc: a predictor for identifying origin of replication with pseudo k-tuple nucleotide composition, Chemom. Intell. Lab. Syst., 141, 100-106, (2015)
[62] Lin, H., Ipro54-pseknc: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition, Nucleic Acids Res., 42, 21, 12961-12972, (2014)
[63] Liu, Z., Idna-methyl: identifying DNA methylation sites via pseudo trinucleotide composition, Anal. Biochem., 474, 69-77, (2015)
[64] Liu, B., Pse-in-one: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res., 43, W1, W65-W71, (2015)
[65] Liu, B., Imirna-psedpc: microrna precursor identification with a pseudo distance-pair composition approach, J. Biomol. Struct. Dyn., 34, 1, 223-235, (2016)
[66] Liu, B.; Wu, H.; Chou, K.-C., Pse-in-one 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nat. Sci., 9, 04, 67, (2017)
[67] Liu, B.; Yang, F.; Chou, K.-C., 2L-pirna: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function, Mol. Ther. Nucl. Acids, 7, 267-277, (2017)
[68] Mei, J.; Zhao, J., Analysis and prediction of presynaptic and postsynaptic neurotoxins by Chou’s general pseudo amino acid composition and motif features, J. Theor. Biol., 447, 147-153, (2018)
[69] Meyer, K. D., Comprehensive analysis of mrna methylation reveals enrichment in 3′ UTRs and near stop codons, Cell, 149, 7, 1635-1646, (2012)
[70] Narayan, P.; Rottman, F. M., An in vitro system for accurate methylation of internal adenosine residues in messenger RNA, Science, 242, 4882, 1159-1162, (1988)
[71] Nichols, J., ‘cap’structures in maize poly (A)-containing RNA, Biochimica et Biophysica Acta (BBA) Nucleic Acids Protein Synth., 563, 2, 490-495, (1979)
[72] Qiu, W.-R., Imethyl-pseaac: identification of protein methylation sites via a pseudo amino acid composition approach, BioMed. Res. Int., 2014, (2014)
[73] Qiu, W.-R.; Xiao, X.; Chou, K.-C., Irspot-tncpseaac: identify recombination spots with trinucleotide composition and pseudo amino acid components, Int. J. Mol. Sci., 15, 2, 1746-1766, (2014)
[74] Qiu, W.-R., Iubiq-lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model, J. Biomol. Struct. Dyn., 1-12, (2014), ahead-of-print
[75] Qiu, W.-R., Iubiq-lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model, J. Biomol. Struct. Dyn., 33, 8, 1731-1742, (2015)
[76] Qiu, W.-R., Iptm-mlys: identifying multiple lysine PTM sites and their different types, Bioinformatics, 32, 20, 3116-3123, (2016)
[77] Qiu, W.-R., Iphos-pseen: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier, Oncotarget, 7, 32, 51270, (2016)
[78] Qiu, W.-R., Irna-2methyl: identify RNA 2′-O-methylation sites by incorporating sequence-coupled effects into general pseknc and ensemble classifier, Med. Chem., 13, 8, 734-743, (2017)
[79] Qiu, W.-R., Irnam5C-psednc: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition, Oncotarget, 8, 25, 41178, (2017)
[80] Qiu, W., Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou’s pseudo-amino acid composition, J. Theor. Biol., 450, 86-103, (2018) · Zbl 1397.92228
[81] Rao, P. N., A probabilistic neural network approach for protein superfamily classification, J. Theor. Appl. Inf. Technol., 6, 1, 101-105, (2009)
[82] Sitamahalakshmi, T., Performance comparison of radial basis function networks and probabilistic neural networks for telugu character recognition, Global J. Comput. Sci. Technol., 11, 4, (2011)
[83] Song, J., Prevail, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework, J. Theor. Biol., 443, 125-137, (2018)
[84] Specht, D. F., Probabilistic neural networks, Neural Netw., 3, 1, 109-118, (1990)
[85] Wang, Y., N 6-methyladenosine modification destabilizes developmental regulators in embryonic stem cells, Nat. Cell Biol., 16, 2, 191, (2014)
[86] Wei, C. M.; Gershowitz, A.; Moss, B., 5′-terminal and internal methylated nucleotide sequences in hela cell mrna, Biochemistry, 15, 2, 397-401, (1976)
[87] Wootton, J. C.; Federhen, S., Statistics of local complexity in amino acid sequences and sequence databases, Comput. Chem., 17, 2, 149-163, (1993) · Zbl 0825.92102
[88] Xiao, X.; Wang, P.; Chou, K.-C., Inr-physchem: a sequence-based predictor for identifying nuclear receptors and their subfamilies via physical-chemical property matrix, PLoS One, 7, 2, e30869, (2012)
[89] Xiao, X., Iamp-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types, Anal. Biochem., 436, 2, 168-177, (2013)
[90] Xiao, X., Ploc-mgpos: predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and pseaac, Genomics, (2018)
[91] Xu, Y., Isno-pseaac: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition, PLoS One, 8, 2, e55844, (2013)
[92] Xu, Y., Ihyd-pseaac: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition, Int. J. Mol. Sci., 15, 5, 7594-7610, (2014)
[93] Xu, Y., Initro-tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition, PLoS One, 9, 8, (2014)
[94] Yue, Y.; Liu, J.; He, C., RNA N6-methyladenosine methylation in post-transcriptional gene expression regulation, Genes Dev., 29, 13, 1343-1355, (2015)
[95] Zhang, L.; Kong, L., Irspot-ADPM: identify recombination spots by incorporating the associated dinucleotide product model into Chou’s pseudo components, J. Theor. Biol, (2018)
[96] Zhang, L.; Kong, L., Irspot-PDI: identification of recombination spots by incorporating dinucleotide property diversity information into Chou’s pseudo components, Genomics, (2018)
[97] Zhang, J., PSNO: predicting cysteine S-nitrosylation sites by incorporating various sequence-derived features into the general form of Chou’s pseaac, Int. J. Mol. Sci., 15, 7, 11204-11219, (2014)
[98] Zheng, G., ALKBH5 is a Mammalian RNA demethylase that impacts RNA metabolism and mouse fertility, Mol. Cell, 49, 1, 18-29, (2013)
[99] Zhong, S., MTA is an arabidopsis messenger RNA adenosine methylase and interacts with a homolog of a sex-specific splicing factor, Plant Cell, 20, 5, 1278-1288, (2008)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.