×

zbMATH — the first resource for mathematics

Classification of membrane protein types using voting feature interval in combination with Chou’s pseudo amino acid composition. (English) Zbl 1343.92006
Summary: Membrane protein is a major constituent of cell, performing numerous crucial functions in the cell. These functions are mostly concerned with membrane protein’s types. Initially, membrane proteins types are classified through traditional methods and reasonable results were obtained using these methods. However, due to large exploration of protein sequences in databases, it is very difficult or sometimes impossible to classify through conventional methods, because it is laborious and wasting of time. Therefore, a new powerful discriminating model is indispensable for classification of membrane protein’s types with high precision. In this work, a quite promising classification model is developed having effective discriminating power of membrane protein’s types. In our classification model, silent features of protein sequences are extracted via pseudo amino acid composition. Five classification algorithms were utilized. Among these classification algorithms voting feature interval has obtained outstanding performance in all the three datasets. The accuracy of proposed model is 93.9% on dataset S1, 89.33% on S2 and 86.9% on dataset S3, respectively, applying 10-fold cross validation test. The success rates revealed that our proposed model has obtained the utmost outcomes than other existing models in literatures so far and will be played a substantial role in the fields of drug design and pharmaceutical industry.

MSC:
92B15 General biostatistics
92C40 Biochemistry, molecular biology
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Akkus, A., and Guvenir, H.A., 1995. K nearest neighbor classification on feature projections. In: Proceedings of the ICML 96, pp. 12-19.
[2] Barati, E., A survey on utilization of data mining approaches for dermatological (skin) diseases prediction, J. Sel. Areas Health Inform., 2, 3, 1-11, (2011), (JSHI)
[3] Cai, Y. D.; Zhou, G. P.; Chou, K. C., Support vector machines for predicting membrane protein types by using functional domain composition, Biophys. J., 84, 3257-3263, (2003)
[4] Cao, D. S.; Xu, Q. S.; Liang, Y. Z., Propy: a tool to generate various modes of chou׳s pseaac, Bioinformatics, 29, 960-962, (2013)
[5] Cao, D. S.; Xu, Q. S.; Liang, Y. Z., Propy: a tool to generate various modes of chou׳s pseaac, Bioinformatics, 29, 960-962, (2013)
[6] Chen, e.a., Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition, J. Theor. Biol., 248, 2, 377-381, (2007)
[7] Chen, W.; Feng, P. M.; Deng, E. Z.; Lin, H.; Chou, K. C., Itis-psetnc: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition, Anal. Biochem., 462, 76-83, (2014)
[8] Chen, W.; Feng, P. M.; Lin, H.; Chou, K. C., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, 6, e68, (2013)
[9] Chen, W.; Lei, T. Y.; Jin, D. C.; Lin, H.; Chou, K. C., Pseknc: a flexible web server for generating pseudo K-tuple nucleotide composition, Anal. Biochem., 456, 53-60, (2014)
[10] Chen, W.; Zhang, X.; Brooker, J.; Lin, H.; Zhang, L.; Chou, K. C., Pseknc-general: a cross-platform package for generating various modes of pseudo nucleotide compositions, Bioinformatics, 31, 1, 119-120, (2014)
[11] Chen, Y. K.; Li, K. B., Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of chou׳s pseudo amino acid composition, J. Theor. Biol., 318, 1-12, (2012) · Zbl 1406.92450
[12] Chou, K. C., Prediction of protein subcellular attributes using pseudo-amino acid composition, Proteins Struct. Funct. Genet., 43, 246-255, (2001)
[13] Chou, K. C., Prediction of protein subcellular attributes using pseudo-amino acid composition, Proteins: Struct. Funct. Genet., 43, 246-255, (2001)
[14] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[15] Chou, K. C., Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology, Curr. Proteom., 6, 4, 262-274, (2009)
[16] Chou, K. C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 1092-1100, (2013)
[17] Chou, K. C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 218-234, (2015)
[18] Chou, K. C.; Elrod, D. E., Prediction of membrane protein types and subcellular locations, Proteins: Struct. Funct. Genet., 34, 137-153, (1999)
[19] Chou, K. C.; Shen, H. B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochem. Biophys. Res. Commun., 360, 339-345, (2007)
[20] Chou, K. C.; Shen, H. B., Cell-ploc: a package of web servers for predicting subcellular localization of proteins in various organisms, Nat. Protoc., 3, 153-162, (2008)
[21] Chou, K. C.; Wu, Z. C.; Xiao, X., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247, (2011) · Zbl 1405.92212
[22] Chou, K. C.; Wu, Z. C.; Xiao, X., Iloc-hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites, Mol. Biosyst., 8, 629-641, (2012)
[23] Dehzangi, A., Khosravi, B.G., 2010. Introducing novel physicochemical based features to enhance protein fold prediction accuracy. In: Proceedings of the IEEE International Conference on Computer Design and Applications (ICCDA), Vol. 1, 2010. pp. V1-592-V1-596.
[24] Demiroz, G.; Guvenir, H. A., Classification by voting feature intervals, LNAI, 1224, 85-92, (1997), Springer Berlin
[25] Du, P.; Gu, S.; Jiao, Y., Pseaac-general: fast building various modes of general form of chou׳s pseudo-amino acid composition for large-scale protein datasets, Int. J. Mol. Sci., 15, 3495-3506, (2014)
[26] Du, P.; Wang, X.; Xu, C.; Gao, Y., Pseaac-builder: a cross-platform stand-alone program for generating various special chou׳s pseudo-amino acid compositions, Anal. Biochem., 425, 117-119, (2012)
[27] Eyheramendy, S.; Lewis, D. D.; Madigan, D., On the naive Bayes model for text categorization, Artif. Intell. Stat., 3-6, (2003)
[28] Fang, Y.; Guo, Y.; Feng, Y.; Li, M., Predicting DNA-binding proteins: approached from chou׳s pseudo amino acid composition and other specific sequence features, Amino Acids, 34, 103-109, (2008)
[29] Gao, Q. B.; Ye, X. F.; Jin, Z. C.; He, J., Improving discrimination of outer membrane proteins by fusing different forms of pseudo amino acid composition., J. Anal. Biochem., 398, 52-59, (2010)
[30] Guo, S.-H.; Deng, E.-Z.; Xu, L.-Q.; Ding, H.; Lin, H.; Chen, W.; Chou, K.-C., Inuc-pseknc: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition, Bioinformatics, 30, 1522-1529, (2014)
[31] Guo, S. H.; Deng, E. Z.; Xu, L. Q.; Ding, H.; Lin, H.; Chen, W.; Chou, K. C., Inuc-pseknc: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition, Bioinformatics, 30, 1522-1529, (2014)
[32] Guvenir, A supervised machine learning algorithm for arrhythmia analysis, Comput. Cardiol. IEEE, 24, 433-436, (1997)
[33] Guvenir, H. A.; Cakir, M., Voting features based classier with feature construction and its application to predicting financial distress, Expert Syst. Appl., 37, 1713-1718, (2010)
[34] Guvenir, H. A.; Demiroz, G.; Ilter, N., Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals, Artif. Intell. Med., 13, 147-165, (1998)
[35] Han, G. S.; Yu, Z. G.; Anh, V., A two-stage SVM method to predict membrane protein types by incorporating amino acid classifications and physicochemical properties into a general form of chou׳s pseaac, J. Theor. Biol., 344, 31-39, (2014) · Zbl 1412.92242
[36] Hany, e.a., Comparison of domain and hydrophobicity features for the prediction of protein-protein interactions using support vector machines, World Acad. Sci. Eng. Technol., 7, 431-437, (2007)
[37] Hayat, M.; Iqbal, N., Discriminating of protein structure classes by incorporating pseudoaverage chemical shift and support vector machine, J. Comput. Methods Programs Biomed., 116, 184-192, (2014)
[38] Hayat, M.; Khan, A., Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition, J. Theor. Biol., 271, 10-17, (2011) · Zbl 1405.92217
[39] Hayat, M.; Khan, A., Discriminating outer membrane proteins with fuzzy k-nearest neighbor algorithms based on the general form of chou׳s pseaac, Protein Pept. Lett., 19, 411-421, (2012)
[40] Hayat, M.; Khan, A., Prediction of membrane protein types by using dipeptide and pseudo amino acid composition based composite features, IET Commun., 6, 3257-3264, (2012)
[41] Hayat, M.; Khan, A., Memhyb: predicting membrane protein types by hybridizing SAAC and PSSM, J. Theor. Biol., 292, 93-102, (2012) · Zbl 1307.92308
[42] Hayat, M.; Tahir, M., Psofuzzysvm-TMH: identification of transmembrane helix segments using ensemble feature space by incorporated fuzzy support vector machine, J. Mol. Biosyst., 11, 2255-2262, (2015)
[43] Huang, G.; Zhang, Y.; Chen, L.; Zhang, N.; Huang, T.; Cai, Y. D., Prediction of multi-type membrane proteins in human by an integrated approach, PloS One, 9, e93553, (2014)
[44] Huang, Y.; Li, Y., Application of probabilistic neural networks to the class prediction of leukemia and embryonal tumor of central nervous system, Neural Process. Lett., 19, 211-226, (2004)
[45] Jia, J.; Liu; Xiao, X.; Liu, B.; Chou, K. C., Ippi-esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into pseaac, J. Theor. Biol., 377, 47-56, (2015)
[46] Jones, Do transmembrane protein superfolds exist?, FEBS Lett., 423, 281-285, (1998)
[47] Khan, Z. U.; Hayat, M.; Khan, M. A., Discrimination of acidic and alkaline enzyme using pseudo amino acid composition in conjunction with probabilistic neural network model, J. Theor. Biol., 365, 197-203, (2014) · Zbl 1314.92069
[48] Li, W.; Godzik, A., Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 1658-1659, (2006)
[49] Lin, H.; Deng, E.; Ding, H.; Chen, W.; Chou, K. C., Ipro54-pseknc: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition, Nucleic Acids Res., 42, 12961-12972, (2014)
[50] Liu, B.; Fang, L.; Liu, F.; Wang, X.; Chen, J.; Chou, K. C., Identification of real microrna precursors with a pseudo structure status composition approach, PloS One, 10, e0121501, (2015)
[51] Liu, B.; Liu, F.; Fang, L.; Wang, X.; Chou, K. C., Repdna: a python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects, Bioinformatics, 31, 1307-1309, (2015)
[52] Liu, B., Liu, F., Wang, X., Chen, J., Fang, L., and Chou, K.C., 2015b. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic acids research, gkv458.
[53] Liu, B., Xu, J., Lan, X., Xu, R., Zhou, J., Wang, X., and Chou, K.-C., 2014. iDNA-Prot| dis: Identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.
[54] Liu, H.; Wang, M.; Chou, K. C., Low-frequency Fourier spectrum for predicting membrane protein types, Biochem. Biophys. Res. Commun., 336, 737-739, (2005)
[55] Liu, Z.; Xiao, X.; Qiu, W.-R.; Chou, K.-C., Idna-methyl: identifying DNA methylation sites via pseudo trinucleotide composition, Anal. Biochem., 474, 69-77, (2015)
[56] Mandle, A. K.; Jain, P.; Shrivastava, S. K., Protein structure prediction using support vector machine, Int. J. Soft Comput. (IJSC), 3, 67-78, (2012)
[57] Marsolo, Keith, Spatial modeling and classification of corneal shape, IEEE Trans. Inf. Technol. Biomed., 11, 203-212, (2007)
[58] Mei, S., Predicting plant protein subcellular multi-localization by chou׳s pseaac formulation based multi-label homolog knowledge transfer learning, J. Theor. Biol., 310, 80-87, (2012) · Zbl 1337.92065
[59] Mohabatkar, H.; Beigi, M.; Abdolahi, K.; Mohsenzadeh, S., Prediction of allergenic proteins by means of the concept of chou׳s pseudo amino acid composition and a machine learning approach, Med. Chem., 9, 133-137, (2013)
[60] Mohabatkar, H.; Beigi, M.; Esmaeili, A., Prediction of GABA(A) receptor proteins using the concept of chou׳s pseudo-amino acid composition and support vector machine, J. Theor. Biol., 281, 18-23, (2011) · Zbl 1397.92215
[61] Mondal, S.; Bhavna, R.; Babu, R.; Ramakumar, S., Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification, J. Theor. Biol., 243, 252-260, (2006)
[62] Nanni, L.; Lumini, A., Genetic programming for creating chou׳s pseudo amino acid based features for submitochondria localization, Amino Acids, 34, 653-660, (2008)
[63] Nanni, L.; Lumini, A.; Gupta, D.; Garg, A., Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of chou׳s pseudo amino acid composition and on evolutionary information, IEEE/ACM Trans. Comput. Biol. Bioinform., 9, 467-475, (2012)
[64] Paliwal, M.; Kumar, U. A., Neural networks and statistical techniques: a review of applications, Expert Syst. Appl., 36, 2-17, (2009)
[65] Mayuri patel, 2013. Comparative study of multi-class protein structure prediction using advanced soft computing techniques, 2.
[66] Qiu, J. D.; Huang, J. H.; Liang, R. P.; Lu, X. Q., Prediction of G-protein-coupled receptor classes based on the concept of chou׳s pseudo amino acid composition: an approach from discrete wavelet transform, Anal. Biochem., 390, 68-73, (2009)
[67] Qu, W.; Sui, H.; Yang, B.; Qian, W., Improving protein secondary structure prediction using a multi-modal BP method, Comput. Biol. Med., 41, 946-959, (2011)
[68] Rennie, T., 2004. Personal communication regarding WCNB, Australia.
[69] Rezaei, M. A.; Maleki, P. A.; Karami, Z.; Asadabadi, E. B.; Sherafat, M. A.; Moghaddam, K. A.; Fadaie, M.; Forouzanfar, M., Prediction of membrane protein types by means of wavelet analysis and cascaded neural network, J. Theor. Biol., 255, 817-820, (2008)
[70] Röttig, M., Medema, M.H., Blin, K., Weber, T., Rausch, C., and Kohlbacher, O., 2011. NRPSpredictor2 - a web server for predicting NRPS adenylation domain specificity. Nucleic acids research, gkr323.
[71] Sandeep, 2008. Computer security: a machine learning approach, Intrusion Detection and Prevention, Notes on Network Security. Master׳s thesis.
[72] Sarangi, A. N.; Lohani, M.; Aggarwal, R., Prediction of essential proteins in prokaryotes by incorporating various physico-chemical features into the general form of chou׳s pseudo amino acid composition, Protein Pept. Lett., 20, 781-795, (2013)
[73] Shen, H. B.; Chou, K. C., Pseaac: a flexible web server for generating various kinds of protein pseudo amino acid composition, Anal. Biochem., 373, 386-388, (2008)
[74] Specht, D. F., Probabilistic neural networks, Neural Netw., 3, 109-118, (1990)
[75] Sun, X. Y.; Shi, S. P.; Qiu, J. D.; Suo, S. B.; Huang, S. Y.; Liang, R. P., Identifying protein quaternary structural attributes by incorporating physicochemical properties into the general form of chou׳s pseaac via discrete wavelet transform, Mol. Biosyst., 8, 3178-3184, (2012)
[76] Vapnik, V., 1995. The Nature of Statistical Learning Theory. IEEE. · Zbl 0833.62008
[77] Wang, e.a., Using stacking generalization to predict membrane protein types based on pseudo-amino acid, J. Theor. Biol., 242, 941-946, (2006)
[78] Wang, L.; Yuan, Z.; Chen, X.; Zhou, Z., The prediction of membrane protein types with NPE, IEICE Electron. Express, 7, 6, 397-402, (2010)
[79] Wang, M.; Yang, J.; Liu, G. P.; Xu, Z. J.; Chou, K. C., Weighted-support vector machines for predicting membrane protein types based on pseudo amino acid composition, Protein Eng. Des. Sel., 17, 509-516, (2004)
[80] Xiao, X.; Wang, P.; Lin, W. Z.; Jia, J. H.; Chou, K. C., Iamp-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types, Anal. Biochem., 436, 168-177, (2013)
[81] Zhang, G. Y.; Fang, B. S., Using the concept of chou׳s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies, Amino Acids, 34, 565-572, (2008)
[82] Zhou, G. P.; Cai, Y. D., Predicting protease types by hybridizing gene ontology and pseudo amino acid composition, Proteins, 43, 63, 681-684, (2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.