IMem-2LSAAC: a two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into Chou’s pseudo amino acid composition. (English) Zbl 1397.92180

Summary: Membrane proteins execute significant roles in cellular processes of living organisms, ranging from cell signaling to cell adhesion. As a major part of a cell, the identification of membrane proteins and their functional types become a challenging job in the field of bioinformatics and proteomics from last few decades. Traditional experimental procedures are slightly applicable due to lack of recognized structures, enormous time and space. In this regard, the demand for fast, accurate and intelligent computational method is increased day by day.
In this paper, a two-tier intelligent automated predictor has been developed called iMem-2LSAAC, which classifies protein sequence as membrane or non-membrane in first-tier (phase1) and in case of membrane the second-tier (phase2) identifies functional types of membrane protein. Quantitative attributes were extracted from protein sequences by applying three discrete features extraction schemes namely amino acid composition, pseudo amino acid composition and split amino acid composition (SAAC). Various learning algorithms were investigated by using jackknife test to select the best one for predictor. Experimental results exhibited that the highest predictive outcomes were yielded by SVM in conjunction with SAAC feature space on all examined datasets. The true classification rate of iMem-2LSAAC predictor is significantly higher than that of other state-of- the- art methods so far in the literature. Finally, it is expected that the proposed predictor will provide a solid framework for the development of pharmaceutical drug discovery and might be useful for researchers and academia.


92C40 Biochemistry, molecular biology
92D20 Protein sequences, DNA sequences
92-08 Computational methods for problems pertaining to biology
Full Text: DOI


[1] Ahmad, K.; Waris, M.; Hayat, M., Prediction of protein submitochondrial locations by incorporating dipeptide composition into Chou’s general pseudo amino acid composition, J. Membr. Biol., 249, 3, 293-304, (2016)
[2] Ahmad, S.; Kabir, M.; Hayat, M, Identification of heat shock protein families and J-protein types by incorporating dipeptide composition into Chou’s general pseaac, Comp. Methods Prog. Biomed., 122, 2, 165-174, (2015)
[3] Ali, F.; Hayat, M., Classification of membrane protein types using voting feature interval in combination with chou׳ s pseudo amino acid composition, J. Theor. Biol., 384, 78-83, (2015) · Zbl 1343.92006
[4] Ali, Z., Database development and automatic speech recognition of isolated pashto spoken digits using MFCC and K-NN, Int. J. Speech Technol., 18, 2, 271-275, (2015)
[5] Asifullah, K.; Tahir, S. F., Intelligent extraction of a digital watermark from a distorted image, IEICE Trans. Inf. Syst., 91, 7, 2072-2075, (2008)
[6] Breiman, L., Random forests, Mach. Learn., 45, 1, 5-32, (2001) · Zbl 1007.68152
[7] Butt, A. H., A prediction model for membrane proteins using moments based features, BioMed. Res. Int., 2016, (2016)
[8] Cai, Y.-D., Application of SVM to predict membrane protein types, J. Theor. Biol., 226, 4, 373-376, (2004)
[9] Cai, Y.-D.; Zhou, G.-P.; Chou, K.-C., Support vector machines for predicting membrane protein types by using functional domain composition, Biophys. J., 84, 5, 3257-3263, (2003)
[10] Cao, D.-S.; Xu, Q.-S.; Liang, Y.-Z., Propy: a tool to generate various modes of Chou’s pseaac, Bioinformatics, 29, 7, 960-962, (2013)
[11] Cedano, J., Relation between amino acid composition and cellular location of proteins, J. Mol. Biol., 266, 3, 594-600, (1997)
[12] Chang, C. C.; Lin, C.-J., ACM transactions on intelligent systems and technology (TIST), 2, 3, (2011)
[13] Chen, S.-A., Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties, Bioinformatics, 27, 15, 2062-2067, (2011)
[14] Chen, W., Iacp: a sequence-based tool for identifying anticancer peptides, Oncotarget, 7, 13, 16895-16909, (2016)
[15] Chen, W., Irna-AI: identifying the adenosine to inosine editing sites in RNA sequences, Oncotarget, 8, 3, 4208, (2017)
[16] Chen, X.-X., Biomed research international, 2016, 1654623, (2016)
[17] Cheng, X., Iatc-misf: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics, 33, 3, 341-346, (2016)
[18] Cheng, X., Iatc-mhyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals, Oncotarget, 8, 35, 58494, (2017)
[19] Cheng, X., Ploc-manimal: predict subcellular localization of animal proteins with both single and multiple sites, Bioinformatics, 33, 22, 3524-3531, (2017)
[20] Cheng, X.; Xiao, X.; Chou, K.-C., Ploc-mgneg: predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general pseaac, Genomics, S0888-7543, 17, (2017), 30102-7
[21] Cheng, X.; Xiao, X.; Chou, K.-C., Ploc-meuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general pseaac, Genomics, 110, 1, 50-58, (2017)
[22] Cheng, X.; Xiao, X.; Chou, K.-C., Ploc-mvirus: predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general pseaac, Gene, 628, 315, (2017)
[23] Cheng, X.; Xiao, X.; Chou, K.-C., Ploc-mplant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general pseaac, Mol. BioSyst., 13, 9, 1722-1727, (2017)
[24] Cherian, M.; Sathiyan, S. P., Neural network based ACC for optimized safety and comfort, Int. J. Comp. Appl., 42, (2012)
[25] Chou, K., Prediction of protein cellular attributes using pseudo-amino acid composition (vol 43, pg 246, 2001), Proteins Struct. Funct. Genet., 44, 1, 60, (2001)
[26] Chou, K. C., Prediction of protein cellular attributes using pseudo‐amino acid composition, Proteins: Struct. Funct. Bioinf., 43, 3, 246-255, (2001)
[27] Chou, K.-C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 1, 10-19, (2005)
[28] Chou, K.-C., Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology, Curr. Proteom., 6, 4, 262-274, (2009)
[29] Chou, K.-C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 1, 236-247, (2011) · Zbl 1405.92212
[30] Chou, K.-C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 6, 1092-1100, (2013)
[31] Chou, K.-C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top. Med. Chem., 17, 21, 2337-2358, (2017)
[32] Chou, K.-C.; Cai, Y.-D., Using GO-pseaa predictor to identify membrane proteins and their types, Biochem. Biophys. Res. Commun., 327, 3, 845-847, (2005)
[33] Chou, K.-C.; Cai, Y.-D., Prediction of membrane protein types by incorporating amphipathic effects, J. Chem. Inf. Model., 45, 2, 407-413, (2005)
[34] Chou, K. C.; Elrod, D. W., Prediction of membrane protein types and subcellular locations, Proteins: Struct. Funct. Bioinf., 34, 1, 137-153, (1999)
[35] Chou, K.-C.; Shen, H.-B., Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers, J. Proteom. Res, 5, 8, 1888-1897, (2006)
[36] Chou, K.-C.; Shen, H.-B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochem. Biophys. Res. Commun., 360, 2, 339-345, (2007)
[37] Chou, K.-C.; Shen, H.-B., Recent advances in developing web-servers for predicting protein attributes, Natural Sci., 1, 02, 63, (2009)
[38] Cigizoglu, H. K.; Alp, M., Generalized regression neural network in modelling river sediment yield, Adv. Eng. Softw., 37, 2, 63-68, (2006)
[39] Ding, C., Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions, J. Proteom., 77, 321-328, (2012)
[40] Du, P., Pseaac-builder: A cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions, Anal. Biochem., 425, 2, 117-119, (2012)
[41] Fan, G. L., DSPMP: discriminating secretory proteins of malaria parasite by hybridizing different descriptors of Chou’s pseudo amino acid patterns, J. Comput. Chem., 36, 31, 2317-2327, (2015)
[42] Fawcett, T., An introduction to ROC analysis, Pattern Recognit. Lett., 27, 8, 861-874, (2006)
[43] Feng, P.-M., Ihsp-pseraaac: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition, Anal. Biochem., 442, 1, 118-125, (2013)
[44] Feng, Z.-P., Abundance of intrinsically unstructured proteins in P. falciparum and other apicomplexan parasite proteomes, Mol. Biochem. Parasitol., 150, 2, 256-267, (2006)
[45] Gao, Q.-B., Improving discrimination of outer membrane proteins by fusing different forms of pseudo amino acid composition, Anal. Biochem., 398, 1, 52-59, (2010)
[46] Goulermas, J. Y.; Liatsis, P.; Zeng, X.-J., Kernel regression networks with local structural information and covariance volume adaptation, Neurocomputing, 72, 1, 257-261, (2008)
[47] Guo, S.-H., Inuc-pseknc: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition, Bioinformatics, 30, 11, 522-529, (2014)
[48] Guyon, I., Gene selection for cancer classification using support vector machines, Mach. Learn., 46, 1-3, 389-422, (2002) · Zbl 0998.68111
[49] Han, G.-S.; Yu, Z.-G.; Anh, V., A two-stage SVM method to predict membrane protein types by incorporating amino acid classifications and physicochemical properties into a general form of Chou’s pseaac, J. Theor. Biol., 344, 31-39, (2014) · Zbl 1412.92242
[50] Hayat, M.; Khan, A., Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition, J. Theor. Biol., 271, 1, 10-17, (2011) · Zbl 1405.92217
[51] Hayat, M.; Khan, A.; MemHyb, Predicting membrane protein types by hybridizing SAAC and PSSM, J. Theor. Biol., 292, 93-102, (2012) · Zbl 1307.92308
[52] Hayat, M.; Khan, A.; Yeasin, M., Prediction of membrane proteins using split amino acid and ensemble classification, Amino Acids, 42, 6, 2447-2460, (2012)
[53] He, X., Targetfreeze: identifying antifreeze proteins via a combination of weights using sequence evolutionary information and pseudo amino acid composition, J. Membr. Biol., 248, 6, 1005-1014, (2015)
[54] Huang, C.; Yuan, J.-Q., A multilabel model based on Chou’s pseudo-amino acid composition for identifying membrane proteins with both single and multiple functional types, J. Membr. Biol., 246, 4, 327-334, (2013)
[55] Huang, G., Prediction of multi-type membrane proteins in human by an integrated approach, PloS One, 9, 3, e93553, (2014)
[56] Jia, J., Ippi-esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into pseaac, J. Theor. Biol., 377, 47-56, (2015)
[57] Jia, J., Icar-psecp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general pseaac, Oncotarget, 7, 23, 34558, (2016)
[58] Jia, J., Isuc-pseopt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset, Anal. Biochem., 497, 48-56, (2016)
[59] Jia, J., Ippbs-opt: a sequence-based ensemble classifier for identifying protein-protein binding sites by optimizing imbalanced training datasets, Molecules, 21, 1, 95, (2016)
[60] Jia, J., Psuc-lys: predict lysine succinylation sites in proteins with pseaac and ensemble random forest approach, J. Theor. Biol., 394, 223-230, (2016) · Zbl 1343.92153
[61] Jones, D.T., 1998. Do transmembrane protein superfolds exist? FEBS Letters. 423(3): p. 281-285.
[62] Jwo, D.-J.; Lai, C.-C., Neural network-based GPS GDOP approximation and classification, GPS Solut., 11, 1, 51-60, (2007)
[63] Kabir, M., Itis-pseknc: identification of translation initiation site in human genes using pseudo k-tuple nucleotides composition, Comput. Biol. Med., 66, 252-257, (2015)
[64] Kabir, M.; Hayat, M., Irspot-gaensc: identifing recombination spots via ensemble classifier and extending the concept of Chou’s pseaac to formulate DNA samples, Mol. Genet. Gen., 291, 1, 285-296, (2016)
[65] Kabir, M.; Yu, D.-J., Predicting dnase I hypersensitive sites via un-biased pseudo trinucleotide composition, Chemom. Intell. Lab. Syst., 167, 2017, 78-84, (2017)
[66] Kandaswamy, K. K., AFP-pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties, J. Theor. Biol., 270, 1, 56-62, (2011)
[67] Karp, G., Cell and molecular biology: concepts and experiments, (2013), Wiley Global Education
[68] Khan, S., Analysis of dengue infection based on Raman spectroscopy and support vector machine (SVM), Biomed. Opt. Exp., 7, 6, 2249-2256, (2016)
[69] Khan, Z. U.; Hayat, M.; Khan, M. A., Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model, J. Theor. Biol., 365, 197-203, (2015) · Zbl 1314.92069
[70] Li, F.-M.; Li, Q.-Z., Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach, Protein Pept. Lett., 15, 6, 612-616, (2008)
[71] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition, J. Theor. Biol., 252, 2, 350-356, (2008) · Zbl 1398.92076
[72] Lin, H., Ipro54-pseknc: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition, Nucl. Acids Res., 42, 21, 12961-12972, (2014)
[73] Lin, H.; Chen, W.; Ding, H., Acalpred: a sequence-based tool for discriminating between acidic and alkaline enzymes, PloS one, 8, 10, e75726, (2013)
[74] Liu, B., Identification of real microrna precursors with a pseudo structure status composition approach, PloS one, 10, 3, (2015)
[75] Liu, B., Pse-in-one: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucl. Acids Res., 43, W1, W65-W71, (2015)
[76] Liu, B., Ienhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition, Bioinformatics, 32, 3, 362-369, (2016)
[77] Liu, B., Irspot-EL: identify recombination spots with an ensemble learning approach, Bioinformatics, 33, 1, 35-41, (2016)
[78] Liu, B., Pse-analysis: a python package for DNA/RNA and protein/peptide sequence analysis based on pseudo components and kernel methods, Oncotarget, 8, 8, 13338, (2017)
[79] Liu, B., Ipromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based pseknc, Bioinformatics, 34, 1, 33-40, (2017)
[80] Liu, B.; Wu, H.; Chou, K.-C., Pse-in-one 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences, Natural Sci., 9, 04, 67, (2017)
[81] Liu, H.; Wang, M.; Chou, K.-C., Low-frequency Fourier spectrum for predicting membrane protein types, Biochem. Biophys. Res. Commun., 336, 3, 737-739, (2005)
[82] Liu, Z., Idna-methyl: identifying DNA methylation sites via pseudo trinucleotide composition, Anal. Biochem., 474, 69-77, (2015)
[83] Mahdavi, A.; Jahandideh, S., Application of density similarities to predict membrane protein types based on pseudo-amino acid composition, J. Theor. Biol., 276, 1, 132-137, (2011) · Zbl 1405.92218
[84] Matthews, B. W., Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochim. Biophys. Acta (BBA) Protein Struct., 405, 2, 442-451, (1975)
[85] Mirza, M. T., Mitprot-pred: predicting mitochondrial proteins of plasmodium falciparum parasite using diverse physiochemical properties and ensemble classification, Comput. Biol. Med., 43, 10, 1502-1511, (2013)
[86] Mitchell, T. M., Machine Learning, 45, 37, (1997), McGraw Hill Burr Ridge, IL
[87] Mohabatkar, H., Prediction of cyclin proteins using Chou’s pseudo amino acid composition, Protein Pept. Lett., 17, 10, 1207-1214, (2010)
[88] Nakashima, H.; Nishikawa, K.; Tatsuo, O., The folding type of a protein is relevant to the amino acid composition, J. Biochem., 99, 1, 153-162, (1986)
[89] Nanni, L.; Brahnam, S.; Lumini, A., Wavelet images and Chou’s pseudo amino acid composition for protein classification, Amino Acids, 43, 2, 657-665, (2012)
[90] Nanni, L.; Lumini, A., An ensemble of support vector machines for predicting the membrane protein type directly from the amino acid sequence, Amino Acids, 35, 3, 573-580, (2008)
[91] Nanni, L.; Lumini, A.; Brahnam, S., An empirical study of different approaches for protein classification, Sci. World J., 2014, 236717, (2014)
[92] Parzen, E., On estimation of a probability density function and mode, Ann. Math. Stat., 33, 3, 1065-1076, (1962) · Zbl 0116.11302
[93] Qiu, J.-D., Prediction of the types of membrane proteins based on discrete wavelet transform and support vector machines, Protein J., 29, 2, 114-119, (2010)
[94] Qiu, W., Ikcr-pseens: identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier, Genomics, (2017)
[95] Qiu, W.-R., Imethyl-pseaac: identification of protein methylation sites via a pseudo amino acid composition approach, BioMed. Res. Int., 2014, 947416, (2014)
[96] Qiu, W.-R., Iphos-pseen: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier, Oncotarget, 7, 32, 51270, (2016)
[97] Qiu, W.-R., Ihyd-psecp: identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general pseaac, Oncotarget, 7, 28, 44310, (2016)
[98] Qiu, W.-R., Iptm-mlys: identifying multiple lysine PTM sites and their different types, Bioinformatics, 32, 20, 3116-3123, (2016)
[99] Qiu, W.-R.; Xiao, X.; Chou, K.-C., Irspot-tncpseaac: identify recombination spots with trinucleotide composition and pseudo amino acid components, Int. J. Mol. Sci., 15, 2, 1746-1766, (2014)
[100] Rao, P. N., A probabilistic neural network approach for protein superfamily classification, J. Theor. Appl. Inf. Technol., 6, 1, 101-105, (2009)
[101] Rezaei, M. A., Prediction of membrane protein types by means of wavelet analysis and cascaded neural networks, J. Theor. Biol., 254, 4, 817-820, (2008)
[102] Röttig, M., Nrpspredictor2—a web server for predicting NRPS adenylation domain specificity, Nucle. Acids Res., gkr323, (2011)
[103] Shen, H.-B.; Yang, J.; Chou, K.-C., Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition, J. Theor. Biol., 240, 1, 9-13, (2006)
[104] Specht, D. F., Probabilistic neural networks, Neural Netw., 3, 1, 109-118, (1990)
[105] Specht, D. F., A general regression neural network, IEEE Trans. Neural Netw., 2, 6, 568-576, (1991)
[106] Tahir, M.; Hayat, M., Inuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou’s pseaac, Mol. BioSyst, (2016)
[107] Tang, H.; Chen, W.; Lin, H., Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique, Mol. BioSyst., 12, 4, 1269-1275, (2016)
[108] Tusnády, G. E.; Dosztányi, Z.; Simon, I., Transmembrane proteins in the protein data bank: identification and classification, Bioinformatics, 20, 17, 2964-2972, (2004)
[109] Wan, S.; Mak, M.-W.; Kung, S.-Y., IEEE/ACM Trans Comput Biol Bioinform, 13, 4, 706-718, (2016)
[110] Wang, L., The prediction of membrane protein types with NPE, IEICE Electron. Exp., 7, 6, 397-402, (2010)
[111] Wang, M., Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition, Protein Eng. Des. Sel., 17, 6, 509-516, (2004)
[112] Wang, M., SLLE for predicting membrane protein types, J. Theor. Biol., 232, 1, 7-15, (2005)
[113] Waris, M., Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix, Neurocomputing, 199, 154-162, (2016)
[114] Xiao, X., Idrug-target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach, J. Biomol. Struct. Dyn., 33, 10, 2221-2233, (2015)
[115] Xiao, X., Iros-gpseknc: predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition, Oncotarget, 7, 23, 34180, (2016)
[116] Xiao, X., Ploc-mgpos: incorporate key gene ontology information into general pseaac for predicting subcellular localization of Gram-positive bacterial proteins, Natural Sci., 9, 09, 330, (2017)
[117] Xu, Y., Isno-pseaac: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition, PloS one, 8, 2, e55844, (2013)
[118] Xu, Y., Ipreny-pseaac: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into pseaac, Med. Chem., 13, 6, 544-551, (2017)
[119] Zhang, C.-J., Iori-human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition, Oncotarget, 7, 43, 69783-69793, (2016)
[120] Zhang, L., Using the SMOTE technique and hybrid features to predict the types of ion channel-targeted conotoxins, J. Theor. Biol., 403, 75-84, (2016)
[121] Zou, H.-L.; Xiao, X., A new multi-label classifier in identifying the functional types of human membrane proteins, J. Membr. Biol., 248, 2, 179-186, (2015)
[122] Zuo, Y.-C., Discrimination of membrane transporter protein types using K-nearest neighbor method derived from the similarity distance of total diversity measure, Mol. BioSyst., 11, 3, 950-957, (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.