BlaPred: predicting and classifying \(\beta\)-lactamase using a 3-tier prediction system via Chou’s general PseAAC. (English) Zbl 1406.92215

Summary: Antibiotics of \(\beta\)-lactam class account for nearly half of the global antibiotic use. The \(\beta\)-lactamase enzyme is a major element of the bacterial arsenals to escape the lethal effect of \(\beta\)-lactam antibiotics. Different variants of \(\beta\)-lactamases have evolved to counter the different types of \(\beta\)-lactam antibiotics. Extensive research has been done to isolate and characterize different variants of \(\beta\)-lactamases. Unfortunately, identification and classification of the \(\beta\)-lactamase enzyme are purely based on experiments, which is both time- and resource-consuming. Thus, there is a need for fast and accurate computational methods to identify and classify new \(\beta\)-lactamase enzymes from the avalanche of sequence data generated in the post-genomic era. Based on these considerations, we have developed a support vector machine based three-tier prediction system, BlaPred, to predict and classify (as per Ambler classification) \(\beta\)-lactamases solely from their protein sequences. The input features used were amino acid composition, classic and amphiphilic pseudo amino acid compositions. The results show that the classic pseudo amino acid composition-based models performed better than the other models. Following a leave-one-out cross-validation procedure, the accuracy to discriminate \(\beta\)-lactamases from non-\(\beta\)-lactamases was 93.57% (tier-I); accuracies for prediction of class A \(\beta\)-lactamases was 93.27%, 95.52% for class B, 96.86% for class C and 97.31% for class D (tier-II); and at tier-III the accuracies for prediction were 84.78%, 95.65% and 89.13% for subclasses B1, B2 and B3, respectively. The comparative results on an independent dataset suggests that our method works efficiently to distinguish \(\beta\)-lactamases from non-\(\beta\)-lactamases, with an overall accuracy of 93.09%, and is further able to classify \(\beta\)-lactamase sequences into their respective Ambler classes and subclasses with accuracy higher than 92% and 87%, respectively. Comparative performance of BlaPred on an independent benchmark dataset also shows a significant improvement over other existing methods. Finally, BlaPred is available as a webserver, as well as standalone software, which can be accessed at http://proteininformatics.org/mkumar/blapred.


92C40 Biochemistry, molecular biology
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI


[1] Abraham, E. P.; Chain, E., An enzyme from bacteria able to destroy penicillin, Rev. Infect. Dis., 10, 677-678, (1988), 1940
[2] Ambler, R. P., The structure of beta-lactamases, Philos. Trans. R. Soc. Lond. B Biol. Sci., 289, 321-331, (1980)
[3] Arif, M.; Hayat, M.; Jan, Z., Imem-2LSAAC: a two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into Chou’s pseudo amino acid composition, J. Theor. Biol., 442, 11-21, (2018) · Zbl 1397.92180
[4] Bush, K.; Jacoby, G. A.; Medeiros, A. A., A functional classification scheme for beta-lactamases and its correlation with molecular structure, Antimicrob. Agents Chemother., 39, 1211-1233, (1995)
[5] Cai, Y. D.; Liu, X. J.; Xu, X. B.; Chou, K. C., Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect, J. Cell Biochem., 84, 343-348, (2002)
[6] Cai, Y. D.; Feng, K. Y.; Lu, W. C.; Chou, K. C., Using logitboost classifier to predict protein structural classes, J. Theor. Biol., 238, 172-176, (2006)
[7] Chen, W.; Lin, H.; Chou, K. C., Pseudo nucleotide composition or pseknc: an effective formulation for analyzing genomic sequences, Mol. Biosyst., 11, 2620-2634, (2015)
[8] Chen, W.; Feng, P. M.; Lin, H.; Chou, K. C., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, e68, (2013)
[9] Chen, W.; Lei, T. Y.; Jin, D. C.; Lin, H.; Chou, K. C., Pseknc: a flexible web server for generating pseudo K-tuple nucleotide composition, Anal. Biochem., 456, 53-60, (2014)
[10] Chen, W.; Feng, P. M.; Deng, E. Z.; Lin, H.; Chou, K. C., Itis-psetnc: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition, Anal. Biochem., 462, 76-83, (2014)
[11] Chen, W.; Tang, H.; Ye, J.; Lin, H.; Chou, K. C., Irna-pseu: identifying RNA pseudouridine sites, Mol. Ther. Nucleic Acids, 5, e332, (2016)
[12] Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K. C., Irna-3typea: identifying three types of modification at RNA’s adenosine sites, Mol. Ther. Nucleic Acids, 11, 468-474, (2018)
[13] Cheng, X.; Xiao, X.; Chou, K. C., Ploc-mplant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general pseaac, Mol. Biosyst., 13, 1722-1727, (2017)
[14] Cheng, X.; Xiao, X.; Chou, K. C., Ploc-mvirus: predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general pseaac, Gene, 628, 315-321, (2017)
[15] Cheng, X.; Xiao, X.; Chou, K. C., Ploc-mgneg: predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general pseaac, Genomics, (2017)
[16] Cheng, X.; Xiao, X.; Chou, K. C., Ploc-meuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general pseaac, Genomics, 110, 50-58, (2018)
[17] Cheng, X.; Xiao, X.; Chou, K. C., Ploc-mhum: predict subcellular localization of multi-location human proteins via general pseaac to winnow out the crucial GO information, Bioinformatics, 34, 1448-1456, (2018)
[18] Cheng, X.; Zhao, S. G.; Xiao, X.; Chou, K. C., Iatc-mhyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals, Oncotarget, 8, 58494-58503, (2017)
[19] Cheng, X.; Zhao, S. G.; Xiao, X.; Chou, K. C., Iatc-misf: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics, 33, 341-346, (2017)
[20] Cheng, X.; Zhao, S. G.; Lin, W. Z.; Xiao, X.; Chou, K. C., Ploc-manimal: predict subcellular localization of animal proteins with both single and multiple sites, Bioinformatics, 33, 3524-3531, (2017)
[21] Chou, K.-C., Using subsite coupling to predict signal peptides, Protein Eng., 14, 75-79, (2001)
[22] Chou, K.-C., Prediction of signal peptides using scaled window, Peptides, 22, 1973-1979, (2001)
[23] Chou, K. C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43, 246-255, (2001)
[24] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[25] Chou, K. C., Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology, Curr. Proteomics, 6, 262-274, (2009)
[26] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247, (2011) · Zbl 1405.92212
[27] Chou, K. C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 1092-1100, (2013)
[28] Chou, K. C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 218-234, (2015)
[29] Chou, K. C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top Med. Chem., 17, 2337-2358, (2017)
[30] Chou, K. C.; Zhang, C. T., Prediction of protein structural classes, Crit. Rev. Biochem. Mol. Biol., 30, 275-349, (1995)
[31] Chou, K. C.; Shen, H. B., Recent advances in developing web-servers for predicting protein attributes, Natural Sci., 1, 63-92, (2009)
[32] Dehzangi, A.; Heffernan, R.; Sharma, A.; Lyons, J.; Paliwal, K.; Sattar, A., Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into chous general pseaac, J. Theor. Biol., 364, 284-294, (2015) · Zbl 1405.92092
[33] Du, P.; Li, Y., Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence, BMC Bioinf., 7, 518, (2006)
[34] Du, P.; Wang, X.; Xu, C.; Gao, Y., Pseaac-builder: a cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions, Anal. Biochem., 425, 117-119, (2012)
[35] Du, Q. S.; Wang, S. Q.; Xie, N. Z.; Wang, Q. Y.; Huang, R. B.; Chou, K. C., 2L-PCA: a two-level principal component analyzer for quantitative drug design and its applications, Oncotarget, 8, 70564-70578, (2017)
[36] Ehsan, A.; Mahmood, K.; Khan, Y. D.; Khan, S. A.; Chou, K. C., A novel modeling in mathematical biology for classification of signal peptides, Sci. Rep., 8, 1039, (2018)
[37] Feng, P.; Ding, H.; Yang, H.; Chen, W.; Lin, H.; Chou, K. C., Irna-psecoll: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into pseknc, Mol. Ther. Nucleic Acids, 7, 155-163, (2017)
[38] Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chen, W.; Chou, K. C., Idna6ma-pseknc: identifying DNA N(6)-methyladenosine sites by incorporating nucleotide physicochemical properties into pseknc, Genomics, (2018)
[39] Galleni, M.; Lamotte-Brasseur, J.; Rossolini, G. M.; Spencer, J.; Dideberg, O.; Frere, J. M.; Metallo-beta-lactamases Working, G., Standard numbering scheme for class B beta-lactamases, Antimicrob. Agents Chemother, 45, 660-663, (2001)
[40] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K. C., Ippi-esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into pseaac, J. Theor. Biol., 377, 47-56, (2015)
[41] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K. C., Psuc-lys: predict lysine succinylation sites in proteins with pseaac and ensemble random forest approach, J. Theor. Biol., 394, 223-230, (2016) · Zbl 1343.92153
[42] Khan, Y. D.; Rasool, N.; Hussain, W.; Khan, S. A.; Chou, K. C., Iphost-pseaac: identify phosphothreonine sites by incorporating sequence statistical moments into pseaac, Anal. Biochem., 550, 109-116, (2018)
[43] Khan, Z. U.; Hayat, M.; Khan, M. A., Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model, J. Theor. Biol., 365, 197-203, (2015) · Zbl 1314.92069
[44] Knapp, C. W.; Dolfing, J.; Ehlert, P. A.; Graham, D. W., Evidence of increasing antibiotic resistance gene abundances in archived soils Since 1940, Environ. Sci. Technol., 44, 580-587, (2010)
[45] Knox, J. R.; Moews, P. C.; Frere, J. M., Molecular evolution of bacterial beta-lactam resistance, Chem. Biol., 3, 937-947, (1996)
[46] Kumar, M.; Verma, R.; Raghava, G. P., Prediction of mitochondrial proteins using support vector machine and hidden Markov model, J. Biol. Chem., 281, 5357-5363, (2006)
[47] Kumar, R.; Kumari, B.; Kumar, M., Proteome-wide prediction and annotation of mitochondrial and sub-mitochondrial proteins by incorporating domain information, Mitochondrion, (2017)
[48] Kumar, R.; Kumari, B.; Srivastava, A.; Kumar, M., Nrfampred: a proteome-scale two level method for prediction of nuclear receptor proteins and their sub-families, Sci. Rep., 4, 6810, (2014)
[49] Kumar, R.; Srivastava, A.; Kumari, B.; Kumar, M., Prediction of beta-lactamase and its class by Chou’s pseudo-amino acid composition and support vector machine, J. Theor. Biol., 365, 96-103, (2015) · Zbl 1314.92055
[50] Li, W.; Godzik, A., Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 1658-1659, (2006)
[51] Lin, H.; Deng, E. Z.; Ding, H.; Chen, W.; Chou, K. C., Ipro54-pseknc: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition, Nucleic Acids Res., 42, 12961-12972, (2014)
[52] Liu, B.; Yang, F.; Chou, K. C., 2L-pirna: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function, Mol. Ther. Nucleic Acids, 7, 267-277, (2017)
[53] Liu, B.; Wu, H.; Chou, K. C., Pse-in-one 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences, Natural Sci., 9, 67-91, (2017)
[54] Liu, B.; Wang, S.; Long, R.; Chou, K. C., Irspot-EL: identify recombination spots with an ensemble learning approach, Bioinformatics, 33, 35-41, (2017)
[55] Liu, B.; Yang, F.; Huang, D. S.; Chou, K. C., Ipromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based pseknc, Bioinformatics, 34, 33-40, (2018)
[56] Liu, B.; Fang, L.; Long, R.; Lan, X.; Chou, K. C., Ienhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition, Bioinformatics, 32, 362-369, (2016)
[57] Liu, B.; Fang, L.; Liu, F.; Wang, X.; Chen, J.; Chou, K. C., Identification of real microrna precursors with a pseudo structure status composition approach, PLoS One, 10, (2015)
[58] Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K. C., Pse-in-one: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res., 43, W65-W71, (2015)
[59] Liu, L. M.; Xu, Y.; Chou, K. C., Ipgk-pseaac: identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general pseaac, Med. Chem., 13, 552-559, (2017)
[60] Meher, P. K.; Sahu, T. K.; Saini, V.; Rao, A. R., Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general pseaac, Sci. Rep., 7, 42362, (2017)
[61] Mei, J.; Zhao, J., Analysis and prediction of presynaptic and postsynaptic neurotoxins by Chou’s general pseudo amino acid composition and motif features, J. Theor. Biol., 447, 147-153, (2018) · Zbl 06890033
[62] Mei, J.; Zhao, J., Prediction of HIV-1 and HIV-2 proteins by using Chou’s pseudo amino acid compositions and different classifiers, Sci. Rep., 8, 2359, (2018)
[63] Meroueh, S. O.; Minasov, G.; Lee, W.; Shoichet, B. K.; Mobashery, S., Structural aspects for evolution of beta-lactamases from penicillin-binding proteins, J. Am. Chem. Soc., 125, 9612-9618, (2003)
[64] Muthu Krishnan, S., Using Chou’s general pseaac to analyze the evolutionary relationship of receptor associated proteins (RAP) with various folding patterns of protein domains, J. Theor. Biol., 445, 62-74, (2018) · Zbl 06898959
[65] Qiu, W. R.; Sun, B. Q.; Xiao, X.; Xu, Z. C.; Chou, K. C., Iptm-mlys: identifying multiple lysine PTM sites and their different types, Bioinformatics, 32, 3116-3123, (2016)
[66] Qiu, W. R.; Jiang, S. Y.; Xu, Z. C.; Xiao, X.; Chou, K. C., Irnam5C-psednc: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition, Oncotarget, 8, 41178-41188, (2017)
[67] Shen, H. B.; Chou, K. C., Ensemble classifier for protein fold pattern recognition, Bioinformatics, 22, 1717-1722, (2006)
[68] Shen, H. B.; Chou, K. C., Pseaac: a flexible web server for generating various kinds of protein pseudo amino acid composition, Anal. Biochem., 373, 386-388, (2008)
[69] Song, J.; Wang, Y.; Li, F.; Akutsu, T.; Rawlings, N. D.; Webb, G. I.; Chou, K. C., Iprot-sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites, Brief Bioinf, (2018)
[70] Song, J.; Li, F.; Leier, A.; Marquez-Lago, T. T.; Akutsu, T.; Haffari, G.; Chou, K. C.; Webb, G. I.; Pike, R. N.; Hancock, J., Prosperous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy, Bioinformatics, 34, 684-687, (2018)
[71] Srivastava, A.; Singhal, N.; Goel, M.; Virdi, J. S.; Kumar, M., CBMAR: a comprehensive beta-lactamase molecular annotation resource, Database (Oxford), (2014), 2014, bau111
[72] Srivastava, A.; Singhal, N.; Goel, M.; Virdi, J. S.; Kumar, M., Identification of family specific fingerprints in beta-lactamase families, Sci. World J., 2014, (2014)
[73] UniProt, C., Uniprot: a hub for protein information, Nucleic Acids Res., 43, D204-D212, (2015)
[74] Walsh, T. R.; Toleman, M. A.; Poirel, L.; Nordmann, P., Metallo-beta-lactamases: the quiet before the storm?, Clin. Microbiol. Rev., 18, 306-325, (2005)
[75] Wang, J.; Yang, B.; Revote, J.; Leier, A.; Marquez-Lago, T. T.; Webb, G.; Song, J.; Chou, K. C.; Lithgow, T., POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles, Bioinformatics, 33, 2756-2758, (2017)
[76] Wang, P.; Xiao, X.; Chou, K. C., NR-2L: a two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features, PLoS One, 6, e23505, (2011)
[77] Xiao, X.; Wang, P.; Chou, K. C., Quat-2L: a web-server for predicting protein quaternary structural attributes, Mol. Divers, 15, 149-155, (2011)
[78] Xiao, X.; Wang, P.; Chou, K. C., GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions, Mol. Biosyst., 7, 911-919, (2011)
[79] Xiao, X.; Wang, P.; Lin, W. Z.; Jia, J. H.; Chou, K. C., Iamp-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types, Anal. Biochem., 436, 168-177, (2013)
[80] Xiao, X.; Cheng, Xiang; Su, Shengchao; Mao, Qi; Chou, Kuo-Chen, Ploc-mgpos: incorporate key gene ontology information into general pseaac for predicting subcellular localization of Gram-positive bacterial proteins, Natural Sci., 9, 330-349, (2017)
[81] Xu, Y.; Wang, Z.; Li, C.; Chou, K. C., Ipreny-pseaac: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into pseaac, Med. Chem., 13, 544-551, (2017)
[82] Xu, Y.; Shao, X. J.; Wu, L. Y.; Deng, N. Y.; Chou, K. C., Isno-aapair: incorporating amino acid pairwise coupling into pseaac for predicting cysteine S-nitrosylation sites in proteins, Peer J., 1, e171, (2013)
[83] Xu, Y.; Wen, X.; Wen, L. S.; Wu, L. Y.; Deng, N. Y.; Chou, K. C., Initro-tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition, PLoS One, 9, (2014)
[84] Zavaljevski, N.; Stevens, F. J.; Reifman, J., Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions, Bioinformatics, 18, 689-696, (2002)
[85] Zhang, S.; Duan, X., Prediction of protein subcellular localization with oversampling approach and Chou’s general pseaac, J. Theor. Biol., 437, 239-250, (2018) · Zbl 1394.92047
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.