×

zbMATH — the first resource for mathematics

Predicting membrane protein types by incorporating a novel feature set into Chou’s general PseAAC. (English) Zbl 1406.92470
Summary: Membrane proteins are vital type of proteins that serve as channels, receptors and energy transducers in a cell. They perform various important functions, which are mainly associated with their types. They are also attractive targets of drug discovery for various diseases. So predicting membrane protein types is a crucial and challenging research area in bioinformatics and proteomics. Because of vast investigation of uncharacterized protein sequences in databases, customary biophysical techniques are extremely tedious, costly and vulnerable to mistakes. Subsequently, it is very attractive to build a vigorous, solid, proficient technique to predict membrane protein types. In this work, a novel feature set ‘exchange group based protein sequence representation’ (EGBPSR) is proposed for classification of membrane proteins with two new feature extraction strategies known as ‘exchange group local pattern’ (EGLP) and ‘amino acid interval pattern’ (AIP). Imbalanced dataset and large dataset are often handled well by decision tree classifiers. Since imbalanced dataset are taken, the performance of various decision tree classifiers such as decision tree (DT), classification and regression tree (CART), ensemble methods such as adaboost, random under sampling (RUS) boost, rotation forest and random forest are analyzed. The overall accuracy achieved in predicting membrane protein types is 96.45%.

MSC:
92D20 Protein sequences, DNA sequences
92C37 Cell biology
92C40 Biochemistry, molecular biology
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Ali, F.; Hayat, M., Classification of membrane protein types using voting feature interval in combination with Chou’s pseudo amino acid composition, J. Theor. Biol., 384, 78-83, (2015) · Zbl 1343.92006
[2] Berardi, M. J.; Shih, W. M.; Harrison, S. C., Mitochondrial uncoupling protein 2 structure determined by NMR molecular fragment searching, Nature, 476, 109-113, (2011)
[3] Brownlee J., 2015. https://machinelearningmastery.com.
[4] Butt, A. H.; Khan, S. A.; Rasool, N.; Khan, Y. D., A prediction model for membrane proteins using moments based features, Biomed. Res. Int., (2016)
[5] Cai, Y. D.; Chou, K. C., Predicting membrane protein type by functional domain composition and pseudo amino acid composition, J. Theor. Biol., 238, 395-400, (2006)
[6] Cai, Y. D.; Ricardo, P. W.; Jen, C. H.; Chou, K. C., Application of SVM to predict membrane protein types, J. Theor. Biol., 226, 4, 373-376, (2004)
[7] Cai, Y. D.; Zhou, G. P.; Chou, K. C., Support vector machines for predicting membrane protein types by using functional domain composition, Biophys. J., 84, 5, 3257-3263, (2003)
[8] Cedano, J.; Aloy, P.; PeÂrez-Pons, J. A.; Querol, E., Relation between amino acid composition and cellular location of proteins, J. Mol. Biol., 266, 594-600, (1997)
[9] Chen, W.; Ding, H.; Feng, P., Iacp: a sequence-based tool for identifying anti- cancer peptides, Oncotarget, 7, 16895-16909, (2016)
[10] Chen, W.; Feng, P. M.; Lin, H., Irspot-psednc: identify recombination spots with pseudo di nucleotide composition, Nucleic Acids Res., 41, e68, (2013)
[11] Chen, W.; Lei, T. Y.; Jin, D. C., Pseknc: a flexible web-server for generating pseudo K-tuple nucleotide composition, Anal. Biochem., 456, 53-60, (2014)
[12] Chen, W.; Lin, H., Pseudo nucleotide composition or pseknc: an effective formulation for analyzing genomic sequences, Mol. Biosyst., 11, 2620-2634, (2015)
[13] Chen, Y. K.; Li, K. B., Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou’s pseudo amino acid composition, J. Theor. Biol., 318, 1-12, (2013)
[14] Chen, W.; Feng, P.; Yang, H., Irna-3typea: identifying 3-types of modification at RNA’s adenosine sites, Mol. Ther. Nucleic Acid, (2018)
[15] Cheng, X.; Xiao, X., Ploc-mplant: predict subcellular localization of multi-location plant proteins via incorporating the optimal GO information into general pseaac, Mol. Biosyst., 13, 1722-1727, (2017)
[16] Cheng, X.; Xiao, X., Ploc-mvirus: predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general pseaac, Gene (Erratum: ibid., 2018, Vol.644, 156-156), 628, 315-321, (2017)
[17] Cheng, X.; Xiao, X., Ploc-mgneg: predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general pseaac, Genomics, (2017)
[18] Cheng, X.; Xiao, X., Ploc-mhum: predict subcellular localization of multi-location human proteins via general pseaac to winnow out the crucial GO information, Bioinformatics, (2017)
[19] Cheng, X.; Xiao, X., Ploc-meuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general pseaac, Genomics, 110, 50-58, (2018)
[20] Cheng, X.; Zhao, S. G.; Lin, W. Z., Ploc-manimal: predict subcellular localization of animal proteins with both single and multiple sites, Bioinformatics, 33, 3524-3531, (2017)
[21] Cheng, X.; Zhao, S. G.; Xiao, X., Iatc-mhyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals, Oncotarget, (2017)
[22] Cheng, X.; Zhao, S. G.; Xiao, X., Iatc-misf: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics, 33, 341-346, (2017)
[23] Chou, K. C., Prediction of protein cellular attributes using pseudo amino acid composition, Proteins Struct. Funct. Genet., 43, 246-255, (2001)
[24] Chou, K. C., Insights from modelling the tertiary structure of BACE2, J. Proteom. Res., 3, 1069-1072, (2004)
[25] Chou, K. C., Insights from modelling three-dimensional structures of the human potassium and sodium channels, J. Proteom. Res., 3, 856-861, (2004)
[26] Chou, K. C., Structural bioinformatics and its impact to biomedical science, Curr. Med. Chem., 11, 2105-2134, (2004)
[27] Chou, K. C., Insights from modeling the 3D structure of DNA-CBF3b complex, J. Proteom. Res., 4, 1657-1660, (2005)
[28] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247, (2011) · Zbl 1405.92212
[29] Chou, K. C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 1092-1100, (2013)
[30] Chou, K. C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 218-234, (2015)
[31] Chou, K. C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top. Med. Chem., (2017)
[32] Chou, K. C.; Cai, Y. D., Prediction of membrane protein types by incorporating amphipathic effects, J. Chem. Inf. Model., 45, 2, 407-413, (2005)
[33] Chou, K. C.; Cai, Y. D., Using GO-pseaa predictor to identify membrane proteins and their types, Biochem. Biophys. Res. Commun., 327, 3, 845-847, (2005)
[34] Chou, K. C.; Elrod, D. W., Prediction of membrane protein types and sub cellular locations, Proteins Struct. Funct. Bioinf., 34, 1, 137-153, (1999)
[35] Chou, K. C.; Shen, H. B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochem. Biophys. Res. Commun., 360, 2, 339-345, (2007)
[36] Dev, J.; Park, D.; Fu, Q.; Chen, J.; Ha, H. J.; Ghantous, F.; Herrmann, T.; Chang, W.; Liu, Z.; Frey, G.; Seaman, M. S., Structural basis for membrane anchoring of HIV-1 envelope spike, Science, 353, 172-175, (2016)
[37] Feng, Z. P.; Zhang, C. T., Prediction of membrane protein types based on the hydrophobic index of amino acids, J. Protein Chem., 19, 269-275, (2000)
[38] Feng, P.; Ding, H.; Yang, H.; Chen, W., Irna-psecoll: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into pseknc, Mol. Ther. Nucleic Acids, 7, 155-163, (2017)
[39] Feng, P.; Yang, H.; Ding, H.; Lin, H., Idna6ma-pseknc: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into pseknc, Genomics, (2018)
[40] Gao, Q. B.; Ye, X. F.; Jin, Z. C.; He, J., Improving discrimination of outer membrane proteins by fusing different forms of pseudo amino acid composition, J. Anal. Biochem., 398, 52-59, (2010)
[41] Golmohammadi, K. S.K.; Crowley, L.; Reformat, M., Classification of cell membrane proteins, Front. Converg. Biosci. Inf. Technol., 153-158, (2007)
[42] Golmohammadi, S. K.; Kurgan, L.; Crowley, B.; Reformat, M., Amino acid sequence based method for prediction of cell membrane protein types, Int. J. Hybrid Inf. Technol., 1, 1, 95-109, (2008)
[43] Han, G. S.; Yu, Z. G.; Anh, V., A two-stage SVM method to predict membrane protein types by incorporating amino acid classifications and physicochemical properties into a general form of Chou’s pseaac, J. Theor. Biol., 344, 31-39, (2014)
[44] Hayat, M.; Khan, A., Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition, J. Theor. Biol., 262, 10-17, (2011) · Zbl 1405.92217
[45] Hayat, M.; Khan, A., Mem-phybrid: hybrid features-based prediction system for classifying membrane protein types, Anal. Biochem., 424, 35-44, (2012)
[46] Hayat, M.; Khan, A., Memhyb: predicting membrane protein types by hybridizing SAAC and PSSM, J. Theor. Biol., 292, 93-102, (2012) · Zbl 1307.92308
[47] Hayat, M.; Khan, A.; Yeasin, M., Prediction of membrane proteins using split amino acid and ensemble classification, Amino Acids, 42, 6, 2447-2460, (2012)
[48] Howe, W. J., Prediction of the tertiary structure of the beta-secretase zymogen, Biochem. Biophys. Res. Commun., 292, 702-708, (2002)
[49] Huang, C.; Yuan, J. Q., A multi label model based on Chou’s pseudo-amino acid composition for identifying membrane proteins with both single and multiple functional types, J. Membr. Biol., 246, 4, 327-334, (2013)
[50] Jia, J.; Liu, Z.; Xiao, X., Isuc-pseopt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset, Anal. Biochem., 497, 48-56, (2016)
[51] Jia, J.; Liu, Z.; Xiao, X., Ippbs-opt: a sequence-based ensemble classifier for identifying protein-protein binding sites by optimizing imbalanced training dataset, Molecules, 21, 95, (2016)
[52] Jia, J.; Liu, Z.; Xiao, X.; Liu, B., Psuc-lys: predict lysine succinylation sites in proteins with pseaac and ensemble random forest approach, J. Theor. Biol., 394, 223-230, (2016) · Zbl 1343.92153
[53] Jia, J.; Zhang, L.; Liu, Z., Psumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general pseaac, Bioinformatics, 32, 3133-3141, (2016)
[54] Jia, P.; Qian, Z.; Feng, K.; Lu, W.; Li, Y., Prediction of membrane protein types in a hybrid space, J. Proteom. Res., 7, 1131-1137, (2008)
[55] Lin, H.; Deng, E. Z.; Ding, H., Ipro54-pseknc: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition, Nucleic Acids Res, 42, 12961-12972, (2014)
[56] Liu, B.; Wu, H., Pse-in-one 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nat. Sci., 9, 67-91, (2017)
[57] Liu, B.; Yang, F., 2L-pirna: A two-layer ensemble classifier for identifying piwi-interacting RNAs and their function, Mol. Ther. Nucleic Acids, 7, 267-277, (2017)
[58] Liu, B.; Wang, S.; Long, R., Irspot-EL: identify recombination spots with an ensemble learning approach, Bioinformatics, 33, 35-41, (2017)
[59] Liu, B.; Liu, F.; Wang, X., Pse-in-one: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res, 43, W65-W71, (2015)
[60] Liu, B.; Fang, L.; Long, R., Ienhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition, Bioinformatics, 32, 362-369, (2016)
[61] Liu, B.; Long, R., Idhs-EL: identifying dnase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework, Bioinformatics, 32, 2411-2418, (2016)
[62] Liu, H.; Wang, J. M.; Xue, L.; Chou, K. C., Using Fourier spectrum analysis and pseudo amino acid composition for prediction of membrane protein types, Protein J., 24, 6, 385-389, (2005)
[63] Liu, H.; Wang, M.; Chou, K. C., Low-frequency Fourier spectrum for predicting membrane protein types, Biochem. Biophys. Res. Commun., 336, 3, 737-739, (2005)
[64] Liu, Z.; Xiao, X.; Qiu, W. R., Idna-methyl: identifying DNA methylation sites via pseudo tri nucleotide composition, Anal. Biochem., 474, 69-77, (2015)
[65] Liu, Z.; Xiao, X.; Yu, D. J., Prnam-PC: predicting N-methyladenosine sites in RNA sequences via physical-chemical properties, Anal. Biochem., 497, 60-67, (2016)
[66] Liu, B.; Yang, F.; Huang, D. S., Ipromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based pseknc, Bioinformatics, 34, 33-40, (2018)
[67] Mahdavi, A.; Jahandideh, S., Application of density similarities to predict membrane protein types based on pseudo-amino acid composition, J. Theor. Biol., 276, 132-137, (2011) · Zbl 1405.92218
[68] Nanni, L.; Lumini, A., An ensemble of support vector machines for predicting the membrane protein type directly from the amino acid sequence, Amino Acids, 35, 3, 573-580, (2008)
[69] OuYang, B.; Xie, S.; Berardi, M. J.; Zhao, X. M.; Dev, J.; Yu, W., Unusual architect of the p7 channel from hepatitis C virus, Nature, 498, 521-525, (2013)
[70] Oxenoid, K.; Dong, Y. S.; Cao, C.; Cui, T.; Sancak, Y.; Markhard, A. L.; Grabarek, Z.; Kong, L.; Liu, Z.; Ouyang, B.; Cong, Y., Architecture of the mitochondrial calcium uniporter, Nature, 533, 269-273, (2016)
[71] Pu, X.; Guo, J.; Leung, H.; Lin, Y., Prediction of membrane protein types from sequences and position-specific scoring matrices, J. Theor. Biol., 247, 2, 259-265, (2007)
[72] Qiu, J. D.; Sun, X. U.; Huang, J. H.; Liang, R. P., Prediction of the types of membrane proteins based on discrete wavelet transform and support vector machines, J. Protein, 29, 114-119, (2010)
[73] Qiu, W. R.; Sun, B. Q.; Xiao, X., Iptm-mlys: identifying multiple lysine PTM sites and their different types, Bioinformatics, 32, 3116-3123, (2016)
[74] Rezaei, M. A.; Maleki, P. A.; Karami, Z.; Asadabadi, E. B.; Sherafat, M. A.; Moghad-dam, K. A.; Fadaie, M.; Forouzanfar, M., Prediction of membrane protein types by means of wavelet analysis and cascaded neural network, J. Theor. Biol., 255, 817-820, (2008) · Zbl 06959978
[75] Sankari, E. S.; Manimegalai, D., Predicting membrane protein types using various decision tree classifiers based on various modes of general pseaac for imbalanced dataset, J. Theor. Biol., 435, 208-217, (2017)
[76] Schnell, J. R.; Chou, J. J., Structure and mechanism of the M2 proton channel of influenza A virus, Nature, 451, 591-595, (2008)
[77] Shen, H. B., Recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 1, 63-92, (2009)
[78] Shen, H.; Chou, K. C., Using optimized evidence-theoretic K-nearest neighbour classifier and pseudo-amino acid composition to predict membrane protein types, Biochem. Biophys. Res. Commun., 334, 1, 288-292, (2005)
[79] Shen, H. B.; Yang, J.; Chou, K. C., Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition, J. Theor. Biol., 240, 1, 9-13, (2006)
[80] Shen, H. S.; Chou, K. C., Using ensemble classifier to identify membrane protein types, Amino Acids, 32, 483-488, (2007)
[81] Song, J.; Li, F.; Takemoto, K.; Haffari, G., Prevail, an integrative approach for inferring catalytic residues using sequence, structural and network features in a machine learning framework, J. Theor. Biol., 443, 125-137, (2018)
[82] Song, J.; Wang, Y.; Li, F.; Akutsu, T.; Rawlings, N. D., Iprot-sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites, Brief. Bioinform., (2018)
[83] Wan, S.; Mak, M. W.; Kung, S. Y., Mem-men: predicting multi-functional types of membrane proteins by interpretable elastic nets, IEEE/ACM Trans. Comput. Biol. Bioinf., (2015)
[84] Wan, S.; Mak, M. W.; Kung, S. Y., Benchmark data for identifying multi-functional types of membrane proteins, Data Brief, 8, 105-107, (2016)
[85] Wan, S.; Mak, M. W.; Kung, S. Y., Mem-ADSVM: a two-layer multi-label predictor for identifying multi-functional types of membrane proteins, J. Theor. Biol, 398, 32-42, (2016)
[86] Wang, J.; Li, Y.; Wang, Q.; You, X.; Man, J., Proclusensem: predicting membrane protein types by fusing different modes of pseudo amino acid composition, Comput. Biol. Med., 42, 564-574, (2012)
[87] Wang, L.; Yuan, Z.; Chen, X.; Zhou, Z., The prediction of membrane protein types with NPE, IEICE Electron. Express, 6, 397-402, (2010)
[88] Wang, M.; Yang, J.; Liu, G. P.; Xu, Z. J.; Chou, K. C., Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition, Protein Eng. Des. Sel., 17, 6, 509-516, (2004)
[89] Wang, M.; Yang, J.; Xu, Z. J.; Chou, K. C., SLLE for predicting membrane protein types, J. Theor. Biol., 232, 1, 7-15, (2005)
[90] Wang, S. Q.; Yang, J.; Chou, K. C., Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition, J. Theor. Biol., 242, 4, 941-946, (2006)
[91] Wang, T.; Xia, T.; Hu, X. M., Geometry preserving projections algorithm for predicting membrane protein types, J. Theor. Biol., 262, 2, 208-213, (2010) · Zbl 1403.92225
[92] Wang, T.; Yang, J.; Shen, H. B.; Chou, K. C., Predicting membrane protein types by the LLDA algorithm, Protein Pept. Lett., 15, 915-921, (2008)
[93] Wu, Z. C.; Xiao, X., Iloc-hum: using accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites, Mol. Biosyst., 8, 629-641, (2012)
[94] Xiao, X.; Min, J. L.; Lin, W. Z., Idrug-target: predicting the interactions between drug compounds and target proteins in cellular networking via the benchmark dataset optimization approach, J. Biomol. Struct. Dyn., 33, 2221-2233, (2015)
[95] Xiao, X.; Wang, P.; Lin, W. Z., Iamp-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types, Anal. Biochem., 436, 168-177, (2013)
[96] Xiao, X.; Zou, H. L.; Lin, W. Z., Imem-seq: a multi-label learning classifier for predicting membrane proteins types, J. Membr. Biol., 248, 4, 745-752, (2015)
[97] Yang, X. G.; Luo, R. Y.; Feng, Z. P., Using amino acid and peptide composition to predict membrane protein types, Biochem. Biophys. Res. Commun., 353, 1, 164-169, (2007)
[98] Yang, H.; Qiu, W. R.; Liu, G.; Guo, F. B., Irspot-pse6NC: identifying recombination spots in saccharomyces cerevisiae by incorporating hexamer composition into general pseknc, Int. J. Biol. Sci., (2018)
[99] Zaki, N.; El-Hajj, W., Predicting membrane protein type using inter-domain linker knowledge, (Proceedings of the BIOCOMP, (2010)), 209-214
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.