×

Bi-PSSM: position specific scoring matrix based intelligent computational model for identification of mycobacterial membrane proteins. (English) Zbl 1394.92002

Summary: Mycobacterium is a pathogenic bacterium, which is a causative agent of tuberculosis (TB) and leprosy. These diseases are very crucial and become the cause of death of millions of people every year in the world. So, the characterize structure of membrane proteins of the protozoan play a vital role in the field of drug discovery because, without any knowledge about this Mycobacterium’s membrane protein and their types, the scientists are unable to treat this pathogenic protozoan. So, an accurate and competitive computational model is needed to characterize this uncharacterized structure of mycobacterium. Series of attempts were carried out in this connection. Split amino acid compositions, Unbiased-Dipeptide peptide compositions (Unb-DPC), Over-represented tri-peptide compositions, compositions & translation were the few recent encoding techniques followed by different researchers in their publications. Although considerable results have been achieved by these models, still there is a gap which is filled in this study. In this study, an evolutionary feature extraction technique position specific scoring matrix (PSSM) is applied in order to extract evolutionary information from protein sequences. Consequently, 99.6% accuracy was achieved by the learning algorithms. The experimental results demonstrated that the proposed computational model will lead to develop a powerful tool for anti-mycobacterium drugs as well as play a promising rule in proteomic and bioinformatics.

MSC:

92-08 Computational methods for problems pertaining to biology
92C40 Biochemistry, molecular biology
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Afridi, T. H.; Khan, A.; Lee, Y. S., Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition, Amino Acids, 42, 1443-1454 (2012)
[2] Ahmad, S.; Kabir, M.; Hayat, M., Identification of heat shock protein families and J-protein types by incorporating dipeptide composition into Chou’s general PseAAC, Comput. Meth. Progr. Biomed., 122, 165-174 (2015)
[3] Ali, S.; Majid, A.; Khan, A., IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids, Amino acids, 46, 977-993 (2014)
[4] Altschul, S. F.; Koonin, E. V., Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases, Trends Biochem. Sci., 23, 444-447 (1998)
[5] An, Y.; Wang, J.; Li, C.; Leier, A.; Marquez-Lago, T.; Wilksch, J.; Zhang, Y.; Webb, G. I.; Song, J.; Lithgow, T., Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI, Briefings Bioinf. (2016), bbw100
[6] Bartenhagen, C.; Klein, H.-U.; Ruckert, C.; Jiang, X.; Dugas, M., Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data, BMC Bioinf., 11, 567 (2010)
[7] Berardi, M. J.; Shih, W. M.; Harrison, S. C.; Chou, J. J., Mitochondrial uncoupling protein 2 structure determined by NMR molecular fragment searching, Nature, 476, 109-113 (2011)
[8] Breiman, L., Random forests, Mach. Learn., 45, 5-32 (2001) · Zbl 1007.68152
[9] Cai, Y.-D.; Lin, S. L., Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence, Biochim. Biophys. Acta, 1648, 127-133 (2003)
[10] Cai, Z.; Xu, D.; Zhang, Q.; Zhang, J.; Ngai, S.-M.; Shao, J., Classification of lung cancer using ensemble-based feature selection and machine learning methods, Mol. BioSyst., 11, 791-800 (2015)
[11] Chen, C.; Chen, L.; Zou, X.; Cai, P., Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine, Protein Pept. Lett., 16, 27-31 (2009)
[12] Chen, W.; Ding, H.; Feng, P.; Lin, H.; Chou, K.-C., iACP: a sequence-based tool for identifying anticancer peptides, Oncotarget, 7, 16895 (2016)
[13] Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K.-C., iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences, Oncotarget, 8, 4208 (2017)
[14] Cheng, X.; Zhao, S.-G.; Xiao, X.; Chou, K.-C., iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics, 33, 341-346 (2016)
[15] Chou, K.-C., Using pair-coupled amino acid composition to predict protein secondary structure content, J. Protein Chem., 18, 473-480 (1999)
[16] Chou, K.-C., Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology, Curr. Proteomics, 6, 262-274 (2009)
[17] Chou, K.-C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theoret. Biol., 273, 236-247 (2011) · Zbl 1405.92212
[18] Chou, K.-C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 1092-1100 (2013)
[19] Chou, K.-C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 218-234 (2015)
[20] Chou, K.-C.; Shen, H.-B., Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization, Biochem. Biophys. Res. Commun., 347, 150-157 (2006)
[21] Chou, K.-C.; Shen, H.-B., Large-scale predictions of gram-negative bacterial protein subcellular locations, J. Proteome Res., 5, 3420-3428 (2006)
[22] Chou, K.-C.; Shen, H.-B., Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers, J. Proteome Res., 5, 1888-1897 (2006)
[23] Chou, K.-C.; Shen, H.-B., MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM, Biochem. Biophys. Res. Commun., 360, 339-345 (2007)
[24] Chou, K.-C.; Shen, H.-B., Recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 1, 63 (2009)
[25] Chou, K., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Topics Med. Chem (2017)
[26] Chou, K. C., Prediction of protein cellular attributes using pseudo‐amino acid composition, Proteins, 43, 246-255 (2001)
[27] Dev, J.; Park, D.; Fu, Q.; Chen, J.; Ha, H. J.; Ghantous, F.; Herrmann, T.; Chang, W.; Liu, Z.; Frey, G., Structural basis for membrane anchoring of HIV-1 envelope spike, Science, 353, 172-175 (2016)
[28] Ding, C. H.; Dubchak, I., Multi-class protein fold recognition using support vector machines and neural networks, Bioinformatics, 17, 349-358 (2001)
[29] Ding, H.; Deng, E.-Z.; Yuan, L.-F.; Liu, L.; Lin, H.; Chen, W.; Chou, K.-C., iCTX-Type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels, BioMed. Res. Int., 2014 (2014)
[30] Du, P.; Gu, S.; Jiao, Y., PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets, Int. J. Mol. Sci., 15, 3495-3506 (2014)
[31] Fan, G.-L.; Li, Q.-Z., Predict mycobacterial proteins subcellular locations by incorporating pseudo-average chemical shift into the general form of Chou’s pseudo amino acid composition, J. Theor. Biol., 304, 88-95 (2012) · Zbl 1397.92186
[32] Feng, P.; Ding, H.; Yang, H.; Chen, W.; Lin, H.; Chou, K.-C., iRNA-PseColl: identifying the occurrence sites of different rna modifications by incorporating collective effects of nucleotides into PseKNC, Mol. Therapy, 7, 155-163 (2017)
[33] Fu, Q.; Fu, T.-M.; Cruz, A. C.; Sengupta, P.; Thomas, S. K.; Wang, S.; Siegel, R. M.; Wu, H.; Chou, J. J., Structural basis and functional role of intramembrane trimerization of the Fas/CD95 death receptor, Mol. cell, 61, 602-613 (2016)
[34] Gao, Q.-B.; Ye, X.-F.; Jin, Z.-C.; He, J., Improving discrimination of outer membrane proteins by fusing different forms of pseudo amino acid composition, Anal. Biochem., 398, 52-59 (2010)
[35] Hayat, M.; Iqbal, N., Discriminating protein structure classes by incorporating pseudo average chemical shift to Chou’s general PseAAC and support vector machine, Comput. Meth. Programs Biomed., 116, 184-192 (2014)
[36] Hayat, M.; Khan, A., MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM, J. Theor. Biol., 292, 93-102 (2012) · Zbl 1307.92308
[37] Huang, T.; Shi, X.-H.; Wang, P.; He, Z.; Feng, K.-Y.; Hu, L.; Kong, X.; Li, Y.-X.; Cai, Y.-D.; Chou, K.-C., Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks, PloS one, 5, e10972 (2010)
[38] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC, Oncotarget, 7, 34558 (2016)
[39] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., iPPBS-Opt: a sequence-based ensemble classifier for identifying protein-protein binding sites by optimizing imbalanced training datasets, Molecules, 21, 95 (2016)
[40] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset, Anal. Biochem., 497, 48-56 (2016)
[41] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach, J. Theor. Biol., 394, 223-230 (2016) · Zbl 1343.92153
[42] Kabir, M.; Hayat, M., iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples, Mol. Genet. Genomics, 291, 285-296 (2016)
[43] Kabir, M.; Iqbal, M.; Ahmad, S.; Hayat, M., iTIS-PseKNC: Identification of Translation Initiation Site in human genes using pseudo k-tuple nucleotides composition, Comput. Biol. Med., 66, 252-257 (2015)
[44] Khan, A.; Khan, M.; Choi, T.-S., Proximity based GPCRs prediction in transform domain, Biochem. Biophys. Res. Commun., 371, 411-415 (2008)
[45] Khan, M.; Hayat, M.; Khan, S. A.; Iqbal, N., Unb-DPC: identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou’s general PseAAC, J. Theor. Biol., 415, 13-19 (2017)
[46] Kumar, M.; Gromiha, M. M.; Raghava, G. P., SVM based prediction of RNA‐binding proteins using binding residues and evolutionary information, J. Mol. Recognit., 24, 303-313 (2011)
[47] Li, Y.; Wei, D.-Q.; Gao, W.-N.; Gao, H.; Liu, B.-N.; Huang, C.-J.; Xu, W.-R.; Liu, D.-K.; Chen, H.-F.; Chou, K.-C., Computational approach to drug design for oxazolidinones as antibacterial agents, Med. Chem, 3, 576-582 (2007)
[48] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition, J. Theor. Biol., 252, 350-356 (2008) · Zbl 1398.92076
[49] Lin, H.; Deng, E.-Z.; Ding, H.; Chen, W.; Chou, K.-C., iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition, Nucleic Acids Res., 42, 12961-12972 (2014)
[50] Lin, H.; Ding, H.; Guo, F.-B.; Zhang, A.-Y.; Huang, J., Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition, Protein Pept. Lett., 15, 739-744 (2008)
[51] Lin, S.-X.; Lapointe, J., Theoretical and experimental biology in one—A symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Giegé’s 40th anniversary of their scientific careers, J. Biomed. Sci. Eng., 6, 435 (2013)
[52] Liu, B.; Wang, S.; Long, R.; Chou, K.-C., iRSpot-EL: identify recombination spots with an ensemble learning approach, Bioinformatics, 33, 35-41 (2016)
[53] Liu, B.; Wu, H.; Chou, K.-C., Pse-in-One 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nat. Sci., 9, 67 (2017)
[54] Liu, B.; Yang, F.; Chou, K.-C., 2L-piRNA: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function, Mol. Therapy, 7, 267-277 (2017)
[55] Liu, L.; Xu, Y.; Chou, K., iPGK-PseAAC: identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general PseAAC, Med. Chem (2017)
[56] Liu, T.; Zheng, X.; Wang, J., Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile, Biochimie, 92, 1330-1334 (2010)
[57] Liu, Z.; Xiao, X.; Qiu, W.-R.; Chou, K.-C., iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition, Anal. Biochem., 474, 69-77 (2015)
[58] Madden, T., The BLAST sequence analysis tool (2013)
[59] Magrane, M.; Consortium, U., UniProt Knowledgebase: A Hub Of Integrated Protein Data, 2011 (2011), bar009
[60] Meher, P. K.; Sahu, T. K.; Saini, V.; Rao, A. R., Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC, Sci. Rep., 7 (2017)
[61] Niederweis, M.; Danilchanka, O.; Huff, J.; Hoffmann, C.; Engelhardt, H., Mycobacterial outer membranes: in search of proteins, Trends Microbiol., 18, 109-116 (2010)
[62] Ouyang, B.; Xie, S.; Berardi, M. J.; Zhao, X.; Dev, J.; Yu, W.; Sun, B.; Chou, J. J., Unusual architecture of the p7 channel from hepatitis C virus, Nature, 498, 521-525 (2013)
[63] Oxenoid, K.; Dong, Y.; Cao, C.; Cui, T.; Sancak, Y.; Markhard, A. L.; Grabarek, Z.; Kong, L.; Liu, Z.; Ouyang, B., Architecture of the mitochondrial calcium uniporter, Nature (2016)
[64] Pajón, R.; Yero, D.; Lage, A.; Llanes, A.; Borroto, C. J., Computational identification of beta-barrel outer-membrane proteins in Mycobacterium tuberculosis predicted proteomes as putative vaccine candidates, Tuberculosis (Edinb), 86, 3-4, 290-302 (2006)
[65] Peng, H.; Long, F.; Ding, C., Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Patt. Anal. Mach. Intell., 27, 1226-1238 (2005)
[66] Qiu, W.-R.; Xiao, X.; Xu, Z.-C.; Chou, K.-C., iPhos-PseEn: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier, Oncotarget, 7, 51270-51283 (2016)
[67] Qiu, W.; Jiang, S.; Sun, B.; Xiao, X.; Cheng, X.; Chou, K., iRNA-2methyl: identify RNA 2′-O-methylation sites by incorporating sequence-coupled effects into general PseKNC and ensemble classifier, Med. Chem (2017)
[68] Rezaei, M. A.; Abdolmaleki, P.; Karami, Z.; Asadabadi, E. B.; Sherafat, M. A.; Abrishami-Moghaddam, H.; Fadaie, M.; Forouzanfar, M., Prediction of membrane protein types by means of wavelet analysis and cascaded neural networks, J. Theor. Biol., 254, 817-820 (2008)
[69] Schäffer, A. A.; Aravind, L.; Madden, T. L.; Shavirin, S.; Spouge, J. L.; Wolf, Y. I.; Koonin, E. V.; Altschul, S. F., Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements, Nucleic Acids Res., 29, 2994-3005 (2001)
[70] Schnell, J. R.; Chou, J. J., Structure and mechanism of the M2 proton channel of influenza A virus, Nature, 451, 591-595 (2008)
[71] Shao, J.; Xu, D.; Hu, L.; Kwan, Y.-W.; Wang, Y.; Kong, X.; Ngai, S.-M., Systematic analysis of human lysine acetylation proteins and accurate prediction of human lysine acetylation through bi-relative adapted binomial score Bayes feature representation, Mol. BioSyst., 8, 2964-2973 (2012)
[72] Shen, H.-B.; Chou, K.-C., Gneg-mPLoc: a top-down strategy to enhance the quality of predicting subcellular localization of Gram-negative bacterial proteins, J. Theor. Biol., 264, 326-333 (2010) · Zbl 1406.92211
[73] Shen, H.; Chou, K.-C., Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types, Biochem. Biophys. Res. Commun., 334, 288-292 (2005)
[74] Walzer, G.; Rosenberg, E.; Ron, E. Z., Identification of outer membrane proteins with emulsifying activity by prediction of β-barrel regions, J. Microbiol. Meth., 76, 52-57 (2009)
[75] Wang, J.; Yang, B.; Revote, J.; Leier, A.; Marquez-Lago, T. T.; Webb, G.; Song, J.; Chou, K.-C.; Lithgow, T., POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles, Bioinformatics (2017)
[76] Wang, M.; Yang, J.; Liu, G.-P.; Xu, Z.-J.; Chou, K.-C., Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition, Protein Eng. Des. Sel., 17, 509-516 (2004)
[77] Xiao, X.; Min, J.-L.; Lin, W.-Z.; Liu, Z.; Cheng, X.; Chou, K.-C., iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach, J. Biomol. Struct. Dyn., 33, 2221-2233 (2015)
[78] Xiao, X.; Wang, P.; Lin, W.-Z.; Jia, J.-H.; Chou, K.-C., iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types, Anal. Biochem., 436, 168-177 (2013)
[79] Xiao, X.; Wu, Z.-C.; Chou, K.-C., A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites, PloS one, 6, e20592 (2011)
[80] Xiao, X.; Ye, H.-X.; Liu, Z.; Jia, J.-H.; Chou, K.-C., iROS-gPseKNC: predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition, Oncotarget, 7, 34180 (2016)
[81] Xie, Y.; Li, X.; Ngai, E.; Ying, W., Customer churn prediction using improved balanced random forests, Expert Syst. Appl., 36, 5445-5449 (2009)
[82] Xu, Y.; Wang, Z.; Li, C.; Chou, K., iPreny-PseAAC: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into PseAAC, Med. Chem. (2017)
[83] Youn, E.; Peters, B.; Radivojac, P.; Mooney, S. D., Evaluation of features for catalytic residue prediction in novel folds, Protein Sci., 16, 216-226 (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.