iLM-2L: a two-level predictor for identifying protein lysine methylation sites and their methylation degrees by incorporating K-gap amino acid pairs into Chou’s general PseAAC. (English) Zbl 1343.92157

Summary: As one of the most critical post-translational modifications, lysine methylation plays a key role in regulating various protein functions. In order to understand the molecular mechanism of lysine methylation, it is important to identify lysine methylation sites and their methylation degrees accurately. As the traditional experimental methods are time-consuming and labor-intensive, several computational methods have been developed for the identification of methylation sites. However, the prediction accuracy of existing computational methods is still unsatisfactory. Moreover, they are only focused on predicting whether a query lysine residue is a methylation site, without considering its methylation degrees. In this paper, a novel two-level predictor named iLM-2L is proposed to predict lysine methylation sites and their methylation degrees using composition of k-spaced amino acid pairs feature coding scheme and support vector machine algorithm. The 1st level is to identify whether a query lysine residue is a methylation site, and the 2nd level is to identify which methylation degree(s) the query lysine residue belongs to if it has been predicted as a methyllysine site in the 1st level identification. The iLM-2L achieves a promising performance with a Sensitivity of 76.46%, a Specificity of 91.90%, an Accuracy of 85.31% and a Matthew’s correlation coefficient of 69.94% for the 1st level as well as a Precision of 84.81%, an accuracy of 79.35%, a recall of 80.83%, an Absolute\(_-\)Ture of 73.89% and a Hamming\(_-\)loss of 15.63% for the 2nd level in jackknife test. As illustrated by independent test, the performance of iLM-2L outperforms other existing lysine methylation site predictors significantly. A matlab software package for iLM-2L can be freely downloaded from https://github.com/juzhe1120/Matlab_Software/blob/master/iLM-2L_Matlab_Software.rar.


92C40 Biochemistry, molecular biology
92-08 Computational methods for problems pertaining to biology
Full Text: DOI


[1] Bannister, A. J.; Kouzarides, T., Reversing histone methylation, Nature, 436, 1103-1106, (2005)
[2] Cao, D. S.; Xu, Q. S.; Liang, Y. Z., Propy: a tool to generate various modes of chou׳s pseaac, Bioinformatics, 29, 960-962, (2013)
[3] Chang, C. C.; Lin, C. J., Libsvm: a library for support vector machines, ACM Trans. Intell. Syst. Technol. (TIST), 2, 27, (2011)
[4] Chen, H.; Xue, Y.; Huang, N.; Yao, X.; Sun, Z., Memo: a web tool for prediction of protein methylation modifications, Nucl. Acids Res., 34, W249-W253, (2006)
[5] Chen, W.; Feng, P. M.; Deng, E. Z., Itis-psetnc: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition, Anal. Biochem., 462, 76-83, (2014)
[6] Chen, W.; Feng, P. M.; Lin, H., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucl. Acids Res., 41, e68, (2013)
[7] Chen, W.; Lei, T. Y.; Jin, D. C., Pseknc: a flexible web-server for generating pseudo K-tuple nucleotide composition, Anal. Biochem., 456, 53-60, (2014)
[8] Chen, W., Lin, H., 2015. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Mol. BioSyst., 10.1039/c5mb00155b.
[9] Chen, W.; Zhang, X.; Brooker, J.; Lin, H., Pseknc-general: a cross-platform package for generating various modes of pseudo nucleotide compositions, Bioinformatics, 31, 119-120, (2015)
[10] Chen, Y. W.; Lin, C. J., Combining svms with various feature selection strategies, 315-324, (2006), Springer
[11] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247, (2011) · Zbl 1405.92212
[12] Chou, K. C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 1092-1100, (2013)
[13] Chou, K. C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 218-234, (2015)
[14] Chou, K. C.; Shen, H. B., Recent progress in protein subcellular location prediction, Anal. Biochem., 370, 1-16, (2007)
[15] Cristianini, N.; Shawe-Taylor, J., An introduction to support vector machines and other kernel-based learning methods, (2000), Cambridge University Press
[16] Crooks, G. E.; Hon, G.; Chandonia, J. M.; Brenner, S. E., Weblogo: a sequence logo generator, Genome Res., 14, 1188-1190, (2004)
[17] Daily, K.M., Radivojac, P., Dunker, A.K., 2005. Intrinsic disorder and prote in modifications: building an svm predictor for methylation. In: Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB׳05, 2005, pp. 1-7.
[18] Dehzangi, A.; Heffernan, R.; Sharma, A.; Lyons, J.; Paliwal, K.; Sattar, A., Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into chou׳s general pseaac, J. Theor. Biol., 364, 284-294, (2015) · Zbl 1405.92092
[19] Ding, H.; Deng, E. Z.; Yuan, L. F.; Liu, L., Ictx-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels, BioMed Res. Int., 2014, (2014)
[20] Du, P.; Gu, S.; Jiao, Y., Pseaac-general: fast building various modes of general form of chou׳s pseudo-amino acid composition for large-scale protein datasets, Int. J. Mol. Sci., 15, 3495-3506, (2014)
[21] Du, P.; Wang, X.; Xu, C.; Gao, Y., Pseaac-builder: a cross-platform stand-alone program for generating various special chou׳s pseudo-amino acid compositions, Anal. Biochem., 425, 117-119, (2012)
[22] Guo, S. H.; Deng, E. Z.; Xu, L. Q.; Ding, H., Inuc-pseknc: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition, Bioinformatics, 30, 1522-1529, (2014)
[23] Hamamoto, R.; Saloura, V.; Nakamura, Y., Critical roles of non-histone protein lysine methylation in human tumorigenesis, Nat. Rev. Cancer, 15, 110-124, (2015)
[24] Hart-Smith, G.; Chia, S. Z.; Low, J. K.; McKay, M. J.; Molloy, M. P.; Wilkins, M. R., Stoichiometry of saccharomyces cerevisiae lysine methylation: insights into non-histone protein lysine methyltransferase activity, J. Proteome Res., 13, 1744-1756, (2014)
[25] Hu, L. L.; Li, Z.; Wang, K.; Niu, S.; Shi, X. H.; Cai, Y. D.; Li, H. P., Prediction and analysis of protein methylarginine and methyllysine based on multisequence features, Biopolymers, 95, 763-771, (2011)
[26] Huang, J.; Sengupta, R.; Espejo, A. B.; Lee, M. G.; Dorsey, J. A.; Richter, M.; Berger, S. L., P53 is regulated by the lysine demethylase LSD1, Nature, 449, 105-108, (2007)
[27] Jia, C.; Lin, X.; Wang, Z., Prediction of protein S-nitrosylation sites based on adapted normal distribution bi-profile Bayes and chou׳s pseudo amino acid composition, Int. J. Mol. Sci., 15, 10410-10423, (2014)
[28] Jia, J.; Liu, Z.; Xiao, X., Ippi-esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into pseaac, J. Theor. Biol., 377, 47-56, (2015)
[29] Johnson, D. S.; Li, W.; Gordon, D. B.; Bhattacharjee, A.; Curry, B.; Ghosh, J.; Brizuela, L.; Carroll, J. S.; Brown, M.; Flicek, P., Systematic evaluation of variability in chip-chip experiments using predefined DNA targets, Genome Res., 18, 393-403, (2008)
[30] Khan, Z. U.; Hayat, M.; Khan, M. A., Discrimination of acidic and alkaline enzyme using chou׳s pseudo amino acid composition in conjunction with probabilistic neural network model, J. Theor. Biol., 365, 197-203, (2015) · Zbl 1314.92069
[31] Kumar, R.; Srivastava, A.; Kumari, B.; Kumar, M., Prediction of beta-lactamase and its class by chou׳s pseudo-amino acid composition and support vector machine, J. Theor. Biol., 365, 96-103, (2015) · Zbl 1314.92055
[32] Lee, D. Y.; Teyssier, C.; Strahl, B. D.; Stallcup, M. R., Role of protein methylation in regulation of transcription, Endocr. Rev., 26, 147-170, (2005)
[33] Lin, H.; Deng, E. Z.; Ding, H., Ipro54-pseknc: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition, Nucl. Acids Res., 42, 12961-12972, (2014)
[34] Lin, H.; Wang, H.; Ding, H.; Chen, Y. L.; Li, Q. Z., Prediction of subcellular localization of apoptosis protein using chou׳s pseudo amino acid composition, Acta Biotheor., 57, 321-330, (2009)
[35] Lin, S. X.; Lapointe, J., Theoretical and experimental biology in one, J. Biomed. Sci. Eng., 6, 435-442, (2013)
[36] Liu, B.; Fang, L.; Liu, F.; Wang, X.; Chen, J., Identification of real microrna precursors with a pseudo structure status composition approach, PLoS One, 10, e0121501, (2015)
[37] Liu, B., Liu, F., Fang, L., 2015b. repRNA: a web server for generating various feature vectors of RNA sequences Mol. Genet. Genom., 10.1007/s00438-015-1078-7.2015.
[38] Liu, B.; Liu, F.; Fang, L.; Wang, X., Repdna: a python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects, Bioinformatics, 31, 1307-1309, (2015)
[39] Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L., Pse-in-one: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucl. Acids Res., (2015)
[40] Liu, Z.; Xiao, X.; Qiu, W. R., Idna-methyl: identifying DNA methylation sites via pseudo trinucleotide composition, Anal. Biochem., 474, 69-77, (2015)
[41] Martin, C.; Zhang, Y., The diverse functions of histone lysine methylation, Nat. Rev. Mol. Cell Biol., 6, 838-849, (2005)
[42] Mondal, S.; Pai, P. P., Chou׳s pseudo amino acid composition improves sequence-based antifreeze protein prediction, J. Theor. Biol., 356, 30-35, (2014) · Zbl 1412.92249
[43] Murray, K., The occurrence of i”-n-methyl lysine in histones, Biochemistry, 3, 10-15, (1964)
[44] Paik, W. K.; Paik, D. C.; Kim, S., Historical review: the field of protein methylation, Trends Biochem. Sci., 32, 146-152, (2007)
[45] Qiu, W. R.; Xiao, X.; Lin, W. Z.; Chou, K. C., Imethyl-pseaac: identification of protein methylation sites via a pseudo amino acid composition approach, BioMed Res. Int., 2014, (2014)
[46] Shao, J.; Xu, D.; Tsai, S. N.; Wang, Y.; Ngai, S. M., Computational identification of protein methylation sites through bi-profile Bayes feature extraction, PLoS One, 4, e4920, (2009)
[47] Shi, S. P.; Qiu, J. D.; Sun, X. Y.; Suo, S. B.; Huang, S. Y.; Liang, R. P., Pmes: prediction of methylation sites based on enhanced feature encoding scheme, PLoS One, 7, e38772, (2012)
[48] Shien, D. M.; Lee, T. Y.; Chang, W. C.; Hsu, J. B.K.; Horng, J. T.; Hsu, P. C.; Wang, T. Y.; Huang, H. D., Incorporating structural characteristics for identification of protein methylation sites, J. Comput. Chem., 30, 1532-1543, (2009)
[49] Snijders, A. P.; Hung, M. L.; Wilson, S. A.; Dickman, M. J., Analysis of arginine and lysine methylation utilizing peptide separations at neutral ph and electron transfer dissociation mass spectrometry, J. Am. Soc. Mass Spectrom., 21, 88-96, (2010)
[50] Sun, G.D., Cui, W.P., Guo, Q.Y., Miao, L.N., 2014. Histone lysine methylation in diabetic nephropathy. J. Diabetes Res..
[51] Turner, B. M., Cellular memory and the histone code, Cell, 111, 285-291, (2002)
[52] Varier, R. A.; Timmers, H. M., Histone lysine methylation and demethylation pathways in cancer, Biochim. Biophys. Acta (BBA) - Rev.Cancer, 1815, 75-89, (2011)
[53] Wang, X.; Zhang, W.; Zhang, Q.; Li, G. Z., Multip-schlo: multi-label protein subchloroplast localization prediction with chou׳s pseudo amino acid composition and a novel multi-label classifier, Bioinformatics, (2015)
[54] Wang, Y.; Jia, S., Degrees make all the difference: the multifunctionality of histone h4 lysine 20 methylation, Epigenetics, 4, 273-276, (2009)
[55] Xiao, X.; Wang, P.; Lin, W. Z.; Jia, J. H.; Chou, K. C., Iamp-2l: a twolevel multi-label classifier for identifying antimicrobial peptides and their functional types, Anal. Biochem., 436, 168-177, (2013)
[56] Xu, Y.; Ding, J.; Wu, L. Y., Isno-pseaac: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition, PLoS One, 8, e55844, (2013)
[57] Xu, Y.; Shao, X. J.; Wu, L. Y., Isno-aapair: incorporating amino acid pairwise coupling into pseaac for predicting cysteine S-nitrosylation sites in proteins, PeerJ, 1, e171, (2013)
[58] Xu, Y.; Wen, X.; Shao, X. J.; Deng, N. Y., Ihyd-pseaac: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition, Int. J. Mol. Sci., 15, 7594-7610, (2014)
[59] Xu, Y.; Wen, X.; Wen, L. S.; Wu, L. Y., Initro-tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition, PLoS One, 9, e105018, (2014)
[60] Zhang, J.; Zhao, X.; Sun, P.; Ma, Z., PSNO: predicting cysteine S-nitrosylation sites by incorporating various sequence-derived features into the general form of chou׳s pseaac, Int. J. Mol. Sci., 15, 11204-11219, (2014)
[61] Zhang, W.; Xu, X.; Yin, M.; Luo, N.; Zhang, J.; Wang, J., Prediction of methylation sites using the composition of k-spaced amino acid pairs, Protein Pept. Lett., 20, 911-917, (2013)
[62] Zhong, W. Z.; Zhou, S. F., Molecular science for drug development and biomedicine, Int. J. Mol. Sci., 15, 20072-20078, (2014)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.