Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou’s pseudo amino acid composition. (English) Zbl 1406.92450

Summary: The type information of un-annotated membrane proteins provides an important hint for their biological functions. The experimental determination of membrane protein types, despite being more accurate and reliable, is not always feasible due to the costly laboratory procedures, thereby creating a need for the development of bioinformatics methods. This article describes a novel computational classifier for the prediction of membrane protein types using proteins’ sequences. The classifier, comprising a collection of one-versus-one support vector machines, makes use of the following sequence attributes: (1) the cationic patch sizes, the orientation, and the topology of transmembrane segments; (2) the amino acid physicochemical properties; (3) the presence of signal peptides or anchors; and (4) the specific protein motifs. A new voting scheme was implemented to cope with the multi-class prediction. Both the training and the testing sequences were collected from SwissProt. Homologous proteins were removed such that there is no pair of sequences left in the datasets with a sequence identity higher than 40%. The performance of the classifier was evaluated by a jackknife cross-validation and an independent testing experiments. Results show that the proposed classifier outperforms earlier predictors in prediction accuracy in seven of the eight membrane protein types. The overall accuracy was increased from 78.3% to 88.2%. Unlike earlier approaches which largely depend on position-specific substitution matrices and amino acid compositions, most of the sequence attributes implemented in the proposed classifier have supported literature evidences. The classifier has been deployed as a web server and can be accessed at http://bsaltools.ym.edu.tw/predmpt.


92D20 Protein sequences, DNA sequences
68T05 Learning and adaptive systems in artificial intelligence
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI


[1] Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E., The protein data bank, Nucleic Acids Res., 28, 235-242, (2000)
[2] Bhardwaj, N.; Stahelin, R. V.; Langlois, R. E.; Cho, W.; Lu, H., Structural bioinformatics prediction of membrane-binding proteins, J. Mol. Biol., 359, 486-495, (2006)
[3] Bhaskaran, R.; Ponnuswamy, P. K., Positional flexibilities of amino acid residues in globular proteins, Int. J. Peptide Protein Res., 32, 241-255, (1988)
[4] Boeckmann, B.; Bairoch, A.; Apweiler, R.; Blatter, M. C.; Estreicher, A.; Gasteiger, E.; Martin, M. J.; Michoud, K.; O′Donovan, C.; Phan, I.; Pilbout, S.; Schneider, M., The SWISS-PROT protein knowledgebase and its supplement trembl in 2003, Nucleic Acids Res., 31, 365-370, (2003)
[5] Cai, Y. D.; Chou, K. C., Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition, Biochem. Biophys. Res. Commun., 305, 407-411, (2003)
[6] Cai, Y. D.; Chou, K. C., Predicting membrane protein type by functional domain composition and pseudo-amino acid composition, J. Theor. Biol., 238, 395-400, (2006)
[7] Cai, Y. D.; Zhou, G. P.; Chou, K. C., Support vector machines for predicting membrane protein types by using functional domain composition, Biophys. J., 84, 3257-3263, (2003)
[8] Cai, Y. D.; Ricardo, P. W.; Jen, C. H.; Chou, K. C., Application of SVM to predict membrane protein types, J. Theor. Biol., 226, 373-376, (2004)
[9] Cedano, J.; Aloy, P.; Perez-Pons, J. A.; Querol, E., Relation between amino acid composition and cellular location of proteins, J. Mol. Biol., 266, 594-600, (1997)
[10] Chang, C.-C.; Lin, C.-J., LIBSVM: A library for support vector machines, ACM Trans. Intel. Syst. Technol. 2, 27, 1-27, 27, (2011)
[11] Chen, C.; Tian, Y. X.; Zou, X. Y.; Cai, P. X.; Mo, J. Y., Using pseudo-amino acid composition and support vector machine to predict protein structural class, J. Theor. Biol., 243, 444-448, (2006)
[12] Chen, K.; Jiang, Y.; Du, L.; Kurgan, L., Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs, J. Comput. Chem., 30, 163-172, (2009)
[13] Chen, Y.-W.; Lin, C.-J., Combining SVMs with various feature selection strategies, (Guyon, I.; etal., Feature Extraction, vol. 207, (2006), Springer Heidelberg), 315-324
[14] Cho, W.; Stahelin, R. V., Membrane-protein interactions in cell signaling and membrane trafficking, Annu. Rev. Biophys. Biomol. Struct., 34, 119-151, (2005)
[15] Chou, K. C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43, 246-255, (2001)
[16] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247, (2011), doi:S0022-5193(10)00679-X [pii]10.1016/j.jtbi.2010.12.024 · Zbl 1405.92212
[17] Chou, K. C.; Elrod, D. W., Prediction of membrane protein types and subcellular locations, Proteins, 34, 137-153, (1999)
[18] Chou, K. C.; Cai, Y. D., Predicting protein quaternary structure by pseudo amino acid composition, Proteins, 53, 282-289, (2003)
[19] Chou, K. C.; Cai, Y. D., Using GO-pseaa predictor to identify membrane proteins and their types, Biochem. Biophys. Res. Commun., 327, 845-847, (2005)
[20] Chou, K. C.; Cai, Y. D., Prediction of membrane protein types by incorporating amphipathic effects, J. Chem. Inf Model, 45, 407-413, (2005)
[21] Chou, K. C.; Shen, H. B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochem. Biophys. Res. Commun., 360, 339-345, (2007)
[22] Chou, K. C.; Shen, H. B., Recent progress in protein subcellular location prediction, Anal Biochem., 370, 1-16, (2007), doi:S0003-2697(07)00442-3 [pii]10.1016/j.ab.2007.07.006
[23] Chou, K. C.; Shen, H. B., Review: recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 1, 63-92, (2009)
[24] Claros, M. G.; von Heijne, G., Toppred II: an improved software for membrane protein structure predictions, Comput. Appl. Biosci., 10, 685-686, (1994)
[25] Cruz, V.; Ramos, J.; Martinez-Salazar, J., Water-mediated conformations of the alanine dipeptide as revealed by distributed umbrella sampling simulations, quantum mechanics based calculations, and experimental data, J. Phys. Chem. B, 115, 4880-4886, (2011)
[26] Das, R.; Dimitrova, N.; Xuan, Z.; Rollins, R. A.; Haghighi, F.; Edwards, J. R.; Ju, J.; Bestor, T. H.; Zhang, M. Q., Computational prediction of methylation status in human genomic sequences, Proc. Natl. Acad. Sci. U.S.A., 103, 10713-10716, (2006)
[27] de Castro, E.; Sigrist, C. J.; Gattiker, A.; Bulliard, V.; Langendijk-Genevaux, P. S.; Gasteiger, E.; Bairoch, A.; Hulo, N., Scanprosite: detection of PROSITE signature matches and prorule-associated functional and structural residues in proteins, Nucleic Acids Res., 34, W362-W365, (2006)
[28] Diao, Y.; Ma, D.; Wen, Z.; Yin, J.; Xiang, J.; Li, M., Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and lempel–ziv complexity, Amino Acids, 34, 111-117, (2008)
[29] Du, P.; Wang, X.; Xu, C.; Gao, Y., Pseaac-builder: a cross-platform stand-alone program for generating various special chou’s pseudo-amino acid compositions, Anal Biochem., 425, 117-119, (2012)
[30] Eddy, S. R., Profile hidden Markov models, Bioinformatics, 14, 755-763, (1998)
[31] Emanuelsson, O.; Brunak, S.; von Heijne, G.; Nielsen, H., Locating proteins in the cell using targetp, signalp and related tools, Nat. Protoc., 2, 953-971, (2007)
[32] Esmaeili, M.; Mohabatkar, H.; Mohsenzadeh, S., Using the concept of chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses, J. Theor. Biol., 263, 203-209, (2010), doi:S0022-5193(09)00558-X [pii]10.1016/j.jtbi.2009.11.016
[33] Feng, Z. P.; Zhang, C. T., Prediction of membrane protein types based on the hydrophobic index of amino acids, J. Protein Chem., 19, 269-275, (2000)
[34] Fuller, W. A., Sampling statistics, Hobeken, NJ, (2009), John Wiley & Sons, Inc. · Zbl 1179.62019
[35] Gao, Y.; Shao, S.; Xiao, X.; Ding, Y.; Huang, Y.; Huang, Z.; Chou, K. C., Using pseudo amino acid composition to predict protein subcellular location: approached with Lyapunov index, Bessel function, and Chebyshev filter, Amino Acids, 28, 373-376, (2005)
[36] Gonen, M.; Tanugur, A. G.; Alpaydin, E., Multiclass posterior probability support vector machines, IEEE Trans. Neural Netw., 19, 130-139, (2008)
[37] Hartmann, E.; Rapoport, T. A.; Lodish, H. F., Predicting the orientation of eukaryotic membrane-spanning proteins, Proc Natl. Acad. Sci. U.S.A., 86, 5786-5790, (1989)
[38] Hayashi, S.; Wu, H. C., Lipoproteins in bacteria, J. Bioenerg. Biomembr., 22, 451-471, (1990)
[39] Hayat, M.; Khan, A., Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition, J. Theor. Biol., 271, 10-17, (2010) · Zbl 1405.92217
[40] Hayat, M.; Khan, A., Discriminating outer membrane proteins with fuzzy K-nearest neighbor algorithms based on the general form of chou’s pseaac, Protein Pept. Lett., 19, 411-421, (2012)
[41] Hayat, M.; Khan, A.; Yeasin, M., Prediction of membrane proteins using split amino acid and ensemble classification, Amino Acids, 1-14, (2011)
[42] Heijne, G., The distribution of positively charged residues in bacterial inner membrane proteins correlates with the trans-membrane topology, EMBO J., 5, 3021-3027, (1986)
[43] Hsu, C. W.; Lin, C. J., A comparison of methods for multiclass support vector machines, IEEE Trans. Neural. Netw., 13, 415-425, (2002)
[44] Hua, S.; Sun, Z., Support vector machine approach for protein subcellular localization prediction, Bioinformatics, 17, 721-728, (2001)
[45] Janin, J.; Wodak, S., Conformation of amino acid side-chains in proteins, J. Mol. Biol., 125, 357-386, (1978)
[46] Jia, P.; Qian, Z.; Feng, K.; Lu, W.; Li, Y.; Cai, Y., Prediction of membrane protein types in a hybrid space, J Proteome Res., 7, 1131-1137, (2008)
[47] Kall, L.; Krogh, A.; Sonnhammer, E. L., Advantages of combined transmembrane topology and signal peptide prediction—the phobius web server, Nucleic Acids Res., 35, W429-W432, (2007)
[48] Kaufman, L.; Rousseeuw, P. J., Finding groups in data: an introduction to cluster analysis, (2005), John Wiley & Sons, Inc. Hoboken, NJ
[49] Kawashima, S.; Kanehisa, M., Aaindex: amino acid index database, Nucleic Acids Res., 28, 374, (2000)
[50] Kim, S. Y., Effects of sample size on robustness and prediction accuracy of a prognostic gene signature, BMC Bioinformat., 10, 147, (2009)
[51] Koike, A.; Takagi, T., Prediction of protein-protein interaction sites using support vector machines, Protein Eng. Des. Sel., 17, 165-173, (2004)
[52] Kutay, U.; Ahnert-Hilger, G.; Hartmann, E.; Wiedenmann, B.; Rapoport, T. A., Transport route for synaptobrevin via a novel pathway of insertion into the endoplasmic reticulum membrane, EMBO J., 14, 217-223, (1995)
[53] Kyte, J.; Doolittle, R. F., A simple method for displaying the hydropathic character of a protein, J. Mol. Biol., 157, 105-132, (1982)
[54] Lehninger, A. L.; Nelson, D. L.; Cox, M. M., Lehninger principles of biochemistry, (2008), W.H. Freeman New York
[55] Li, H. M.; Chen, L. J., Protein targeting and integration signal for the chloroplastic outer envelope membrane, Plant Cell, 8, 2117-2126, (1996)
[56] Li, W.; Godzik, A., Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 1658-1659, (2006), doi:btl158 [pii]10.1093/bioinformatics/btl158
[57] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using chou’s pseudo amino acid composition, J. Theor. Biol., 252, 350-356, (2008), doi:S0022-5193(08)00055-6 [pii]10.1016/j.jtbi.2008.02.004 · Zbl 1398.92076
[58] Liu, H.; Wang, M.; Chou, K. C., Low-frequency Fourier spectrum for predicting membrane protein types, Biochem. Biophys. Res. Commun., 336, 737-739, (2005), doi:S0006-291X(05)01869-3 [pii]10.1016/j.bbrc.2005.08.160
[59] Liu, H.; Yang, J.; Wang, M.; Xue, L.; Chou, K. C., Using Fourier spectrum analysis and pseudo amino acid composition for prediction of membrane protein types, Protein J., 24, 385-389, (2005)
[60] Liu, L.; Cai, Y.; Lu, W.; Feng, K.; Peng, C.; Niu, B., Prediction of protein-protein interactions based on pseaa composition and hybrid feature selection, Biochem. Biophys. Res. Commun., 380, 318-322, (2009)
[61] Mahdavi, A.; Jahandideh, S., Application of density similarities to predict membrane protein types based on pseudo-amino acid composition, J. Theor. Biol., 276, 132-137, (2011) · Zbl 1405.92218
[62] Mattar, S.; Scharf, B.; Kent, S. B.; Rodewald, K.; Oesterhelt, D.; Engelhard, M., The primary structure of halocyanin, an archaeal blue copper protein, predicts a lipid anchor for membrane fixation, J. Biol. Chem., 269, 14939-14945, (1994)
[63] Mohammad Beigi, M.; Behjati, M.; Mohabatkar, H., Prediction of metalloproteinase family based on the concept of chou’s pseudo amino acid composition using a machine learning approach, J. Struct. Funct. Genomics, 12, 191-197, (2011)
[64] Nadolski, M. J.; Linder, M. E., Protein lipidation, FEBS J., 274, 5202-5210, (2007)
[65] Nam, H. J.; Jeon, J.; Kim, S., Bioinformatic approaches for the structure and function of membrane proteins, BMB Rep., 42, 697-704, (2009), doi:0.5483/BMBRep.2009.42.11.697
[66] Nielsen, H.; Krogh, A., Prediction of signal peptides and signal anchors by a hidden Markov model, Proc. Int. Conf. Intell. Syst. Mol. Biol., 6, 122-130, (1998)
[67] Orlean, P.; Menon, A. K., Thematic review series: lipid posttranslational modifications. GPI anchoring of protein in yeast and Mammalian cells, or: how we learned to stop worrying and love glycophospholipids., J. Lipid Res., 48, 993-1011, (2007)
[68] Park, K. J.; Kanehisa, M., Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs, Bioinformatics, 19, 1656-1663, (2003)
[69] Pierleoni, A.; Martelli, P. L.; Casadio, R., Predgpi: a GPI-anchor predictor, BMC Bioinformat., 9, 392, (2008)
[70] Podell, S.; Gribskov, M., Predicting N-terminal myristoylation sites in plant proteins, BMC Genomics, 5, 37, (2004)
[71] Sarda, D.; Chua, G. H.; Li, K. B.; Krishnan, A., Pslip: SVM based protein subcellular localization prediction using multiple physicochemical properties, BMC Bioinf., 6, 152, (2005)
[72] Shazman, S.; Celniker, G.; Haber, O.; Glaser, F.; Mandel-Gutfreund, Y., Patch finder plus (pfplus): a web server for extracting and displaying positive electrostatic patches on protein surfaces, Nucleic Acids Res., 35, W526-W530, (2007)
[73] Shen, H. B.; Chou, K. C., Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition, Biochem. Biophys. Res. Commun., 337, 752-756, (2005)
[74] Shen, H. B.; Chou, K. C., Using ensemble classifier to identify membrane protein types, Amino Acids, 32, 483-488, (2007)
[75] Shen, H. B.; Chou, K. C., Pseaac: a flexible web server for generating various kinds of protein pseudo amino acid composition, Anal Biochem., 373, 386-388, (2008)
[76] Shen, H. B.; Yang, J.; Chou, K. C., Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition, J. Theor. Biol., 240, 9-13, (2006)
[77] Sigrist, C. J.; Cerutti, L.; de Castro, E.; Langendijk-Genevaux, P. S.; Bulliard, V.; Bairoch, A.; Hulo, N., PROSITE, a protein domain database for functional characterization and annotation, Nucleic Acids Res., 38, D161-D166, (2010)
[78] Singer, S. J.; Nicolson, G. L., The fluid mosaic model of the structure of cell membranes, Science, 175, 720-731, (1972)
[79] Spiess, M., Heads or tails—what determines the orientation of proteins in the membrane, FEBS Lett., 369, 76-79, (1995), doi:0014-5793(95)00551-J [pii]
[80] Tantoso, E.; Li, K. B., Aaindexloc: predicting subcellular localization of proteins based on a new representation of sequences using amino acid indices, Amino Acids, 35, 345-353, (2008)
[81] Vapnik, V., The nature of statistical learning theory, (1995), Springer-Verlag New York · Zbl 0833.62008
[82] Viklund, H.; Elofsson, A., OCTOPUS: improving topology prediction by two-track ANN-based preference scores and an extended topological grammar, Bioinformatics, 24, 1662-1668, (2008), doi:btn221 [pii]10.1093/bioinformatics/btn221
[83] von Heijne, G., Patterns of amino acids near signal-sequence cleavage sites, Eur. J. Biochem., 133, 17-21, (1983)
[84] von Heijne, G., Membrane protein structure prediction. hydrophobicity analysis and the positive-inside rule, J. Mol. Biol., 225, 487-494, (1992)
[85] Vossen, J. H.; Muller, W. H.; Lipke, P. N.; Klis, F. M., Restrictive glycosylphosphatidylinositol anchor synthesis in cwh6/gpi3 yeast cells causes aberrant biogenesis of cell wall proteins, J. Bacteriol., 179, 2202-2209, (1997)
[86] Wang, M.; Yang, J.; Chou, K. C., Using string kernel to predict signal peptide cleavage site based on subsite coupling model, Amino Acids, 28, 395-402, (2005)
[87] Wang, M.; Yang, J.; Xu, Z. J.; Chou, K. C., SLLE for predicting membrane protein types, J. Theor. Biol., 232, 7-15, (2005), doi:S0022-5193(04)00345-5 [pii]10.1016/j.jtbi.2004.07.023
[88] Wang, M.; Yang, J.; Liu, G. P.; Xu, Z. J.; Chou, K. C., Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition, Protein Eng. Des. Sel., 17, 509-516, (2004)
[89] Wang, S. Q.; Yang, J.; Chou, K. C., Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition, J. Theor. Biol., 242, 941-946, (2006), doi:S0022-5193(06)00190-1 [pii]10.1016/j.jtbi.2006.05.006
[90] Wang, T.; Yang, J.; Shen, H. B.; Chou, K. C., Predicting membrane protein types by the LLDA algorithm, Protein Pept. Lett., 15, 915-921, (2008)
[91] Wang, Z. X., The prediction accuracy for protein structural class by the component-coupled method is around 60
[92] Ward, J. J.; McGuffin, L. J.; Buxton, B. F.; Jones, D. T., Secondary structure prediction with support vector machines, Bioinformatics, 19, 1650-1655, (2003)
[93] Yamauchi, E.; Kiyonami, R.; Kanai, M.; Taniguchi, H., Presence of conserved domains in the C-terminus of MARCKS, a major in vivo substrate of protein kinase C: application of ion trap mass spectrometry to the elucidation of protein structures, J. Biochem., 123, 760-765, (1998)
[94] Yang, J. Y.; Yang, M. Q.; Dunker, A. K.; Deng, Y.; Huang, X., Investigation of transmembrane proteins using a computational approach, BMC Genomics, 9, Suppl 1, S7, (2008)
[95] Zhang, C. T.; Chou, K. C., Monte Carlo simulation studies on the prediction of protein folding types from amino acid composition, Biophys. J., 63, 1523-1529, (1992)
[96] Zhou, G. P., An intriguing controversy over protein structural class prediction, J. Protein Chem., 17, 729-738, (1998)
[97] Zhou, G. P.; Doctor, K., Subcellular location prediction of apoptosis proteins, Proteins, 50, 44-48, (2003)
[98] Zhou, X. B.; Chen, C.; Li, Z. C.; Zou, X. Y., Using chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. Theor. Biol., 248, 546-551, (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.