Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. (English) Zbl 1405.92217

Summary: Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and \(R\)-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew’s correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at


92D20 Protein sequences, DNA sequences
68T05 Learning and adaptive systems in artificial intelligence
92C37 Cell biology
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI


[1] Cao, L., Support vector machines experts for time series forecasting, Neurocomputing., 51, 321-339, (2003)
[2] Cai, Y.D.; Liu, X.J.; Xu, X.B.; Chou, K.C., Support vector machines for predicting the specificity of galnac-transferase, Peptides, 23, 205-208, (2002)
[3] Cai, Y.D.; Liu, X.J.; Xu, X.B.; Chou, K.C., Support vector machines for predicting HIV protease cleavage sites in protein, J. comput. chem., 23, 267-274, (2002)
[4] Cai, Y.D.; Liu, X.J.; Xu, X.B.; Chou, K.C., Support vector machines for the classification and prediction of beta-turn types, J. pept. sci., 8, 297-301, (2002)
[5] Cai, Y.D.; Zhou, G.P.; Chou, K.C., Support vector machines for predicting membrane protein types by using functional domain composition, Biophys. J., 84, 3257-3263, (2003)
[6] Cai, Y.D.; Lin, S.; Chou, K.C., Support vector machines for prediction of protein signal sequences and their cleavage sites, Peptides, 24, 159-161, (2003)
[7] Cai, Y.D.; Feng, K.Y.; Li, Y.X.; Chou, K.C., Support vector machine for predicting alpha-turn types, Peptides, 24, 629-630, (2003)
[8] Cai, Y.D.; Ricardo, P.W.; Jen, C.H.; Chou, K.C., Application of SVM to predict membrane protein types, J. theor. biol., 226, 373-376, (2004)
[9] Chen, C.; Chen, L.; Zou, X.; Cai, P., Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine, Protein pept. lett., 16, 27-31, (2009)
[10] Chen, Y.; Isaac, G.C., An introduction to support vector machine overview, Al magzine, 24, 1-2, (2003)
[11] Chou, K.C.; Zhang, C.T., Review: prediction of protein structural classes, Crit. rev. biochem. mol. biol., 30, 275-349, (1995)
[12] Chou, K.C.; Elrod, D.W., Prediction of membrane protein types and subcellualar location, Proteins: struct. funct. genet., 34, 137-153, (1999)
[13] Chou, K.C., Prediction of protein subcellular attributes using pseudo-amino acid composition, proteins: structure, function, Genetics, 43, 246-255, (2001)
[14] Chou, K.C.; Cai, Y.D., Using functional domain composition and support vector machines for prediction of protein subcellular location, J. biol. chem., 277, 45765-45769, (2002)
[15] Chou, K.C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[16] Chou, K.C.; Shen, H.B., Memtype-2 L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochem. biophys. res. commun., 360, 339-345, (2007)
[17] Chou, K.C.; Shen, H.B., Review: recent advances in developing web-servers for predicting protein attributes, Nat. sci., 2, 63-92, (2009), openly accessible at 〈http://www.scirp.org/journal/NS/〉
[18] Chou, K.C., Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology, Curr. proteomics, 6, 262-274, (2009)
[19] Chou, K.C.; Shen, H.B., Cell-ploc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms, Nat. sci., 2, 1090-1103, (2010), openly accessible at 〈http://www.scirp.org/journal/NS/〉
[20] Chou, K.C.; Shen, H.B., A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: euk-mploc 2.0, Plos one, 5, e9931, (2010)
[21] Chou, K.C.; Shen, H.B., Plant-mploc: a top-down strategy to augment the power for predicting plant protein subcellular localization, Plos one, 5, e11335, (2010)
[22] Dayhoff, M.O.; Schwartz, R.M.; Orcutt, B.C., A model of evolutionay change in proteins, Atlas protein sequence struct., 5, 345-352, (1978)
[23] Devroye, L.P.; Györfi, L., Nonparametric density estimation: the L1 view, (1985), Wiley New York
[24] Ding, Y.S.; Zhang, T.L.; Chou, K.C., Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network, Protein pept. lett., 14, 811-815, (2007)
[25] Ding, H.; Luo, L.; Lin, H., Prediction of cell wall lytic enzymes using Chou’s amphiphilic pseudo amino acid composition, Protein pept. lett., 16, 351-355, (2009)
[26] Duda, R.O.; Hart, P.E.; Stock, D.G., Pattern classification, (2001), John Wileys Sons New York
[27] Eghbal, G.M.; Mansoor, J.Z.; Seraj, D.K., Protien superfamily classification using fuzzy rule-based classifier, IEEE trans. nanobiosci., 8, 1-8, (2009)
[28] Eisenberg, D.; Schwarz, E.; Komaromy, M.; Wall, R., Analysis of membrane and surface protein sequences with the hydrophobic moment plot, J. mol. biol., 179, 125-142, (1984)
[29] Fang, G.; Tao, G.; Zang, S., A research on bioinformatics prediction of protein subcellular localization, Curr. bioinformactics, 4, 177-182, (2009)
[30] Golmohammadi, S.K.; Kurgan, L.; Crowley, B.; Reformat, M., Classification of cell membrane proteins, Front. convergence biosci. information technol., 153-158, (2007)
[31] Goulermas, J.Y.; Liatsis, P.; Zeng, X.J., Kernel regression networks with local structural information andcovariance volume adaptation, Neurocomputing., 72, 257-261, (2008)
[32] Kedarisetti, K.; Kuragan, L.; Dick, S., Classifier ensemble for protien structural class prediction with varying homolgoy, Biochem. biophys. res. commun., 348, 981-988, (2006)
[33] Khan, A.; Tahir, S.F.; Majid, A.; Choi, Tae-Sun., Machine learning based adaptive watermark decoding in view of an anticipated attack, Pattern recognition, 41, 2594-2610, (2008) · Zbl 1151.68585
[34] Khan, A., Tahir, S.F., Choi, Tae-Sun, 2008b. Intelligent extraction of a digital watermark from a distorted image. IEICE Transactions on Information Systems E91-D (7), 2072
[35] Khan, A.; Majid, A.; Choi, T.S., Predicting protein subcellular location: exploiting amino acid based sequence of feature spaces and fusion of diverse classifiers, Amino acids, 38, 347-350, (2010)
[36] Liu, H.; Wang, M.; Chou, K.C., Low-frequency Fourier spectrum for predicting membrane protein types, Biochem. biophys. res. commun., 336, 737-739, (2005)
[37] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition, J. theor. biol., 252, 350-356, (2008) · Zbl 1398.92076
[38] Mohabatkar, H., Prediction of cyclin proteins using Chou’s pseudo amino acid composition, Protein pept. lett., 17, 1207-1214, (2010)
[39] Qiu, J.D.; Sun, X.U.; Huang, J.H.; Liang, R.P., Prediction of the types of membrane proteins based on discrete wavelet transform and support vector machines, J. protien, 29, 114-119, (2010)
[40] Rezaei, M.A.; Maleki, P.A.; Karami, Z.; Asadabadi, E.B.; Sherafat, M.A.; Moghaddam, K.A.; Fadaie, M.; Forouzanfar, M., Prediction of membrane protein types by means of wavelet analysis and cascaded neural network, J. theor. biol., 255, 817-820, (2008)
[41] Sonnhammer, E.L.L., Heijne, G.V., Krogh, A., 1998. A hidden Markov model for predicting transemembrane helices in protein sequences. In: Proceedings of Sixth International Conference on Intelligent Systems for Molecular Biology, AAAI/MIT Press, Menlo Park, CA, vol. 6, pp. 175-182.
[42] Specht, D.F., Probabilisitc neural networks, Neural network, 3, 109-118, (1990)
[43] Tsoumakas, G.; Katakis, I., Multi-label classification: an overwiew, Int. J. data warehousing. MIN., 3, 1-13, (2007)
[44] Vilar, S.; Gonzalez-Diaz, H.; Santana, L.; Uriarte, E., A network-QSAR model for prediction of genetic-component biomarkers in human colorectal cancer, J. theor. biol., 261, 449-458, (2009) · Zbl 1403.92088
[45] Wang, M.; Yang, J.; Liu, G.P.; Xu, Z.J.; Chou, K.C., Weighted-support vector machines for predicting membrane protein types based on pseudo amino acid composition, Protein eng. des. sel., 17, 509-516, (2004)
[46] Wang, S.Q.; Yang, J.; Chou, K.C., Using stacking generalization to preict membraen protien types based on pseudo-amino aid, J. theor. biol., 242, 941-946, (2006)
[47] Wang, L.; Yuan, Z.; Chen, X.; Zhou, Z., The prediction of membrane protein types with NPE, IEICE electron. express, vol. 7, No. 6, 397-402, (2010)
[48] Waugh, D.F., Protien – protein interactions, Adv. protein chem., 9, 325-437, (1954)
[49] Wu, C.H.; Berry, M.; Fung, Y.S.; Mclarty, J., Neural networks for full-scale protein sequence classification: sequence encoding with singular value decomposition, Mach. learn., 21, 177-193, (1995)
[50] Wu, C.; Whitson, G.; Mclarty, J.; Ermongkonchai, A.; Chang, T.C., Protein classification artificail neual system, Protein sci., 1, 667-677, (1992)
[51] Zhang, L.I.; Lioa, B.O.; Li, D.; Zhu, W., A novel representation for apoptosis protein subcellular localization prediction using support vector machine, J. theor. biol., 259, 361-365, (2009) · Zbl 1402.92163
[52] Zeng, Y.H.; Guo, Y.Z.; Xiao, R.Q.; Yang, L.; Yu, L.Z.; Li, M.L., Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach, J. theor. biol., 259, 366-372, (2009) · Zbl 1402.92193
[53] Zhou, X.B.; Chen, C.; Li, Z.C.; Zou, X.Y., Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. theor. biol., 248, 546-551, (2007)
[54] Zumdahi, S., Chemistry, (2000), Houghton Mifflin Company
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.