zbMATH — the first resource for mathematics

Geometry preserving projections algorithm for predicting membrane protein types. (English) Zbl 1403.92225
Summary: Given a new uncharacterized protein sequence, a biologist may want to know whether it is a membrane protein or not? If it is, which membrane protein type it belongs to? Knowing the type of an uncharacterized membrane protein often provides useful clues for finding the biological function of the query protein, developing the computational methods to address these questions can be really helpful. In this study, a sequence encoding scheme based on combing pseudo position-specific score matrix (PsePSSM) and dipeptide composition (DC) is introduced to represent protein samples. However, this sequence encoding scheme would correspond to a very high dimensional feature vector. A dimensionality reduction algorithm, the so-called geometry preserving projections (GPP) is introduced to extract the key features from the high-dimensional space and reduce the original high-dimensional vector to a lower-dimensional one. Finally, the K-nearest neighbor (K-NN) and support vector machine (SVM) classifiers are employed to identify the types of membrane proteins based on their reduced low-dimensional features. Our jackknife and independent dataset test results thus obtained are quite encouraging, which indicate that the above methods are used effectively to deal with this complicated problem of predicting the membrane protein type.

92D20 Protein sequences, DNA sequences
68T05 Learning and adaptive systems in artificial intelligence
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI
[1] Bhasin, M.; Raghava, G.P., Eslpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST, Nucleic acids res., 32, W414-W419, (2004)
[2] Bhasin, M.; Raghava, G.P., Gpcrpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors, Nucleic acids res., 32, W383-W389, (2004)
[3] Bhasin, M.; Raghava, G.P., Gpcrsclass: a web tool for the classification of amine type of G-protein-coupled receptors, Nucleic acids res., 33, W143-W147, (2005)
[4] Chen, J.; Liu, H.; Yang, J.; Chou, K.C., Prediction of linear B-cell epitopes using amino acid pair antigenicity scale, Amino acids, 33, 423-428, (2007)
[5] Chou, K.C., A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space, Proteins, 21, 319-344, (1995)
[6] Chou, K.C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43, 246-255, (2001)
[7] Chou, K.C., Structural bioinformatics and its impact to biomedical science, Curr. med. chem., 11, 2105-2134, (2004)
[8] Chou, K.C., Insights from modeling three-dimensional structures of the human potassium and sodium channels, J. proteome. res., 3, 856-861, (2004)
[9] Chou, K.C.; Elrod, D.W., Prediction of membrane protein types and subcellular locations, Proteins, 34, 137-153, (1999)
[10] Chou, K.C.; Shen, H.B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochem. biophys. res. commun., 360, 339-345, (2007)
[11] Denoeux, T., A k-nearest neighbor classification rule based on dempster – shafertheory, IEEE. trans. syst. man cybern., 25, 804-813, (1995)
[12] Douglas, S.M.; Chou, J.J.; Shih, W.M., DNA-nanotube-induced alignment of membrane proteins for NMR structure determination, Proc. natl. acad. sci. USA, 104, 6644-6648, (2007)
[13] Grassmann, J.; Reczko, M.; Suhai, S.; Edler, L., Protein fold class prediction: new methods of statistical classification, Proc. int. conf. intell. syst. mol. biol., 106-112, (1999)
[14] Huang, Y.; Li, Y., Prediction of protein subcellular locations using fuzzy k-NN method, Bioinformatics, 20, 21-28, (2004)
[15] Keller, J.M.; Gray, M.R., A fuzzy k-nearest neighbor algorithm, IEEE trans. syst. man cybern., 15, 580-585, (1985)
[16] Liu, D.Q.; Liu, H.; Shen, H.B.; Yang, J.; Chou, K.C., Predicting secretory protein signal sequence cleavage sites by fusing the marks of global alignments, Amino acids, 32, 493-496, (2007)
[17] Nakashima, H.; Nishikawa, K.; Ooi, T., The folding type of a protein is relevant to the amino acid composition, J. biochem., 99, 153-162, (1986)
[18] Nanni, L.; Lumini, A., An ensemble of support vector machines for predicting the membrane protein type directly from the amino acid sequence, Amino acids, 35, 573-580, (2008)
[19] Nanni, L.; Lumini, A., Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization, Amino acids, 34, 653-660, (2008)
[20] Niu, B.; Cai, Y.D.; Lu, W.C.; Li, G.Z.; Chou, K.C., Predicting protein structural class with adaboost learner, Protein pept. lett., 13, 489-492, (2006)
[21] Reczko, M.; Bohr, H., The DEF data base of sequence based protein fold class predictions, Nucleic acids res., 22, 3616-3619, (1994)
[22] Schaffer, A.A.; Aravind, L.; Madden, T.L.; Shavirin, S.; Spouge, J.L.; Wolf, Y.I.; Koonin, E.V.; Altschul, S.F., Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements, Nucleic acids res., 29, 2994-3005, (2001)
[23] Schnell, J.R.; Chou, J.J., Structure and mechanism of the M2 proton channel of influenza A virus, Nature, 451, 591-595, (2008)
[24] Shen, H.B.; Chou, K.C., Nuc-ploc: a new web-server for predicting protein subnuclear localization by fusing pseaa composition and psepssm, Protein eng. des. sel., 20, 561-567, (2007)
[25] Spiess, M., Heads or tails—what determines the orientation of proteins in the membrane, FEBS lett., 369, 76-79, (1995)
[26] Wang, M.; Yang, J.; Chou, K.C., Using string kernel to predict signal peptide cleavage site based on subsite coupling model, Amino acids, 28, 395-402, (2005)
[27] Xiao, X.; Shao, S.; Ding, Y.; Huang, Z.; Huang, Y.; Chou, K.C., Using complexity measure factor to predict protein subcellular location, Amino acids, 28, 57-61, (2005)
[28] Zhang, T.H.; Li, X.L.; Tao, D.C.; Yang, J., Multimodal biometrics using geometry preserving projections, Pattern recognition, 41, 805-813, (2008) · Zbl 1132.68670
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.