×

zbMATH — the first resource for mathematics

Predicting protein subchloroplast locations with both single and multiple sites via three different modes of Chou’s pseudo amino acid compositions. (English) Zbl 1397.92195
Summary: Owing to the fact that location information can indicate important functionalities of proteins, developing computational tools to predict protein subcellular localization is one of the most efficient and meaningful tasks with no doubt. The existence methods dealing with prediction of protein subchloroplast locations can only handle the case of single location site. Therefore, it is meaningful and challenging to make effort in how to deal with the proteins with multiple subchloroplast location sites instead of just excluding them. To solve this problem, new systems for predicting protein subchloroplast localization with single or multiple sites are developed and discussed in the paper. Three different editions of KNN algorithms and four different types of feature extraction were adopted to construct the prediction systems. This is the first effort to predict the proteins with multiple subchloroplast locations. The overall jackknife success rates achieved by the best combination (features+classifier) on three dataset with different levels of homology were 89.08%, 81.29% and 71.11%. The performance of the prediction models indicate that the proposed methods might be applied as a useful and efficient assistant tool for the prediction of sub-subcellular localizations.

MSC:
92C40 Biochemistry, molecular biology
92C37 Cell biology
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Cao, D. S.; Xu, Q. S.; Liang, Y. Z., Propy: a tool to generate various modes of Chou’s pseaac, Bioinformatics, (2013)
[2] Chang, T. H.; Wu, L. C.; Lee, T. Y.; Chen, S. P.; Huang, H. D.; Horng, J. T., Euloc: a web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou’s pseaac, J. Comput.-Aided Mol. Des., 27, 91-103, (2013)
[3] Chen, C.; Shen, Z. B.; Zou, X. Y., Dual-layer wavelet SVM for predicting protein structural class via the general form of Chou’s pseudo amino acid composition, Protein Pept. Lett., 19, 422-429, (2012)
[4] Chen, C.; Chen, L.; Zou, X.; Cai, P., Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine, Protein Pept. Lett., 16, 27-31, (2009)
[5] Chen, W.; Feng, P. M.; Lin, H.; Chou, K. C., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., (2013)
[6] Chen, W.; Lin, H.; Feng, P. M.; Ding, C.; Zuo, Y. C.; Chou, K. C., Inuc-physchem: a sequence-based predictor for identifying nucleosomes via physicochemical properties, PLoS ONE, 7, e47843, (2012)
[7] Chen, Y. K.; Li, K. B., Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou’s pseudo amino acid composition, J. Theor. Biol., 318, 1-12, (2013)
[8] Chou, K. C., Prediction of protein cellular attributes using pseudo amino acid composition, Proteins: Struct., Funct. Genet. (Erratum: ibid., 2001, vol. 44, 60), 43, 246-255, (2001)
[9] Chou, K. C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43, 246-255, (2001)
[10] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[11] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year review), J. Theor. Biol., 273, 236-247, (2011)
[12] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247, (2011)
[13] Chou, K. C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. BioSyst., 9, 1092-1100, (2013)
[14] Chou, K. C.; Zhang, C. T., Review: prediction of protein structural classes, Crit. Rev. Biochem. Mol. Biol., 30, 275-349, (1995)
[15] Chou, K. C.; Cai, Y. D., Prediction of membrane protein types by incorporating amphipathic effects, J. Chem. Inf. Modeling, 45, 407-413, (2005)
[16] Chou, K. C.; Shen, H. B., Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers, J. Proteome Res., 5, 1888-1897, (2006)
[17] Chou, K. C.; Shen, H. B., Euk-mploc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites, J. Proteome Res., 6, 1728-1734, (2007)
[18] Chou, K. C.; Shen, H. B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochem. Biophys. Res. Commun., 360, 339-345, (2007)
[19] Chou, K. C.; Shen, H. B., REVIEW: recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 1, 63-92, (2009)
[20] Chou, K. C.; Shen, H. B., Plant-mploc: a top-down strategy to augment the power for predicting plant protein subcellular localization, PLoS ONE, 5, e11335, (2010)
[21] Chou, K. C.; Wu, Z. C.; Xiao, X., Iloc-euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins, PLoS ONE, 6, e18258, (2011)
[22] Chou, K. C.; Wu, Z. C.; Xiao, X., Iloc-hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites, Mol. Biosyst., 8, 629-641, (2012)
[23] Denoeux, T., A k-nearest neighbor classification rule based on Dempster-Shafer theory, IEEE Trans. Syst., Man Cybern., 25, 804-813, (1995)
[24] Ding, H.; Luo, L.; Lin, H., Prediction of cell wall lytic enzymes using Chou’s amphiphilic pseudo amino acid composition, Protein Pept. Lett., 16, 351-355, (2009)
[25] Du, P.; Cao, S.; Li, Y., Subchlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm, J. Theor. Biol., 261, 330-335, (2009)
[26] Du, P.; Cao, S.; Li, Y., Subchlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm, J. Theor. Biol., 261, 330-335, (2009)
[27] Du, P.; Wang, X.; Xu, C.; Gao, Y., Pseaac-builder: a cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions, Anal. Biochem., 425, 117-119, (2012)
[28] Esmaeili, M.; Mohabatkar, H.; Mohsenzadeh, S., Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses, J. Theor. Biol., 263, 203-209, (2010)
[29] Ferro, M.; Salvi, D.; Brugiere, S.; Miras, S.; Kowalski, S.; Louwagie, M.; Garin, J.; Joyard, J.; Rolland, N., Proteomics of the chloroplast envelope membranes from arabidopsis thaliana, Mol. Cell Proteomics, 2, 325-345, (2003)
[30] Georgiou, D. N.; Karakasidis, T. E.; Nieto, J. J.; Torres, A., Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou’s pseudo amino acid composition, J. Theor. Biol., 257, 17-26, (2009)
[31] Gu, Q.; Ding, Y. S.; Zhang, T. L., Prediction of G-protein-coupled receptor classes in low homology using Chou’s pseudo amino acid composition with approximate entropy and hydrophobicity patterns, Protein Pept. Lett., 17, 559-567, (2010)
[32] Guo, J.; Rao, N.; Liu, G.; Yang, Y.; Wang, G., Predicting protein folding rates using the concept of Chou’s pseudo amino acid composition, J. Comput. Chem., 32, 1612-1617, (2011)
[33] Hayat, M.; Khan, A., Discriminating outer membrane proteins with fuzzy k-nearest neighbor algorithms based on the general form of Chou’s pseaac, Protein Pept. Lett., 19, 411-421, (2012)
[34] Hayat, M.; Khan, A., Memhyb: predicting membrane protein types by hybridizing SAAC and PSSM, J. Theor. Biol., 292, 93-102, (2012)
[35] Hu, J.; Yan, X., BS-KNN: an effective algorithm for predicting protein subchloroplast localization, Evol. Bioinf. Online, 8, 79-87, (2012)
[36] Huang, T.; Wang, J.; Cai, Y. D.; Yu, H.; Chou, K. C., Hepatitis C virus network based classification of hepatocellular cirrhosis and carcinoma, PLoS ONE, 7, e34460, (2012)
[37] Huang, Y.; Niu, B.; Gao, Y.; Fu, L.; Li, W., CD-HIT suite: a web server for clustering and comparing biological sequences, Bioinformatics, 26, 680-682, (2010)
[38] Jiang, X.; Wei, R.; Zhang, T. L.; Gu, Q., Using the concept of Chou’s pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy, Protein Pept. Lett., 15, 392-396, (2008)
[39] Jin, A.; Samad, S.; Hussain, A., Theoretic evidence k-nearest neighbourhood classifiers in a bimodal biometric verification system, (Kittler, J.; Nixon, M., Audio- and Video-Based Biometric Person Authentication, vol. 2688, (2003), Springer Berlin Heidelberg), 778-786
[40] Khosravian, M.; Faramarzi, F. K.; Beigi, M. M.; Behbahani, M.; Mohabatkar, H., Predicting antibacterial peptides by the concept of Chou’s pseudo-amino acid composition and machine learning methods, Protein Pept. Lett., 20, 180-186, (2013)
[41] Kleffmann, T.; Russenberger, D.; von Zychlinski, A.; Christopher, W.; Sjolander, K.; Gruissem, W.; Baginsky, S., The arabidopsis thaliana chloroplast proteome reveals pathway abundance and novel protein functions, Curr. Biol., 14, 354-362, (2004)
[42] Li, B. Q.; Huang, T.; Liu, L.; Cai, Y. D.; Chou, K. C., Identification of colorectal cancer related genes with mrmr and shortest path in protein-protein interaction network, PLoS ONE, 7, e33393, (2012)
[43] Li, F. M.; Li, Q. Z., Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach, Protein Pept. Lett., 15, 612-616, (2008)
[44] Li, W.; Godzik, A., Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 1658-1659, (2006)
[45] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition, J. Theor. Biol., 252, 350-356, (2008)
[46] Lin, H.; Ding, H.; Feng-Biao Guo, F. B.; Zhang, A. Y.; Huang, J., Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition, Protein Pept. Lett., 15, 739-744, (2008)
[47] Lin, W. Z.; Fang, J. A.; Xiao, X.; Chou, K. C., Iloc-animal: a multi-label learning classifier for predicting subcellular localization of animal proteins, Mol. BioSyst., (2013)
[48] Mei, S., Multi-kernel transfer learning based on Chou’s pseaac formulation for protein submitochondria localization, J. Theor. Biol., 293, 121-130, (2012)
[49] Mohabatkar, H., Prediction of cyclin proteins using Chou’s pseudo amino acid composition, Protein Pept. Lett., 17, 1207-1214, (2010)
[50] Mohabatkar, H.; Mohammad Beigi, M.; Esmaeili, A., Prediction of GABA(A) receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machine, J. Theor. Biol., 281, 18-23, (2011)
[51] Mohabatkar, H.; Beigi, M. M.; Abdolahi, K.; Mohsenzadeh, S., Prediction of allergenic proteins by means of the concept of Chou’s pseudo amino acid composition and a machine learning approach, Med. Chem., 9, 133-137, (2013)
[52] Mundra, P.; Kumar, M.; Kumar, K. K.; Jayaraman, V. K.; Kulkarni, B. D., Using pseudo amino acid composition to predict protein subnuclear localization: approached with PSSM, Pattern Recognition Lett., 28, 1610-1615, (2007)
[53] Niu, B.; Fu, L.; Sun, S.; Li, W., Artificial and natural duplicates in pyrosequencing reads of metagenomic data, BMC Bioinformatics, 11, 187, (2010)
[54] Qiu, J. D.; Huang, J. H.; Shi, S. P.; Liang, R. P., Using the concept of Chou’s pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform, Protein Pept. Lett., 17, 715-722, (2010)
[55] Sahu, S. S.; Panda, G., A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction, Comput. Biol. Chem., 34, 320-327, (2010)
[56] Shen, H. B.; Chou, K. C., Hum-mploc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites, Biochem. Biophys. Res. Commun., 355, 1006-1011, (2007)
[57] Shen, H. B.; Chou, K. C., Nuc-ploc: a new web-server for predicting protein subnuclear localization by fusing pseaa composition and psepssm, Protein Eng. Des. Sel., 20, 561-567, (2007)
[58] Shen, H. B.; Chou, K. C., Gpos-mploc: a top-down approach to improve the quality of predicting subcellular localization of Gram-positive bacterial proteins, Protein Pept. Lett., 16, 1478-1484, (2009)
[59] Shen, H. B.; Chou, K. C., Gneg-mploc: a top-down strategy to enhance the quality of predicting subcellular localization of Gram-negative bacterial proteins, J. Theor. Biol., 264, 326-333, (2010)
[60] Shen, H. B.; Chou, K. C., Virus-mploc: a fusion classifier for viral protein subcellular location prediction by incorporating multiple sites, J. Biomol. Struct. Dyn., 28, 175-186, (2010)
[61] Shi, L.-X.; Theg, S. M., The chloroplast protein import system: from algae to trees, Biochim. et Biophys. Acta (BBA)-Mol. Cell Res., 1833, 314-331, (2013)
[62] Shi, S.-P.; Qiu, J.-D.; Sun, X.-Y.; Huang, J.-H.; Huang, S.-Y.; Suo, S.-B.; Liang, R.-P.; Zhang, L., Identify submitochondria and subchloroplast locations with pseudo amino acid composition: approach from the strategy of discrete wavelet transform feature extraction, Biochim. et Biophys. Acta (BBA)-Mol. Cell Res., 1813, 424-430, (2011)
[63] Wan, S.; Mak, M. W.; Kung, S. Y., GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou’s pseudo-amino acid composition, J. Theor. Biol., 323C, 40-48, (2013)
[64] Wang, H.-H.; Yin, W.-B.; Hu, Z.-M., Advances in chloroplast engineering, J. Genet. Genomics, 36, 387-398, (2009)
[65] Wu, Z. C.; Xiao, X.; Chou, K. C., Iloc-plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites, Mol. BioSyst., 7, 3287-3297, (2011)
[66] Wu, Z. C.; Xiao, X.; Chou, K. C., Iloc-gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex Gram-positive bacterial proteins, Protein Pept. Lett., 19, 4-14, (2012)
[67] Xiao, X.; Wu, Z. C.; Chou, K. C., Iloc-virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites, J. Theor. Biol., 284, 42-51, (2011)
[68] Younes, Z.; Abdallah, F.; Denœux, T., An evidence-theoretic k-nearest neighbor rule for multi-label classification, (Godo, L.; Pugliese, A., Scalable Uncertainty Management, vol. 5785, (2009), Springer Berlin Heidelberg), 297-308
[69] Yu, L.; Guo, Y.; Li, Y.; Li, G.; Li, M.; Luo, J.; Xiong, W.; Qin, W., Secretp: identifying bacterial secreted proteins by fusing new features into Chou’s pseudo-amino acid composition, J. Theor. Biol., 267, 1-6, (2010)
[70] Zeng, Y. H.; Guo, Y. Z.; Xiao, R. Q.; Yang, L.; Yu, L. Z.; Li, M. L., Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach, J. Theor. Biol., 259, 366-372, (2009)
[71] Zhang, G. Y.; Li, H. C.; Gao, J. Q.; Fang, B. S., Predicting lipase types by improved Chou’s pseudo-amino acid composition, Protein Pept. Lett., 15, 1132-1137, (2008)
[72] Zhang, S. W.; Chen, W.; Yang, F.; Pan, Q., Using Chou’s pseudo amino acid composition to predict protein quaternary structure: a sequence-segmented pseaac approach, Amino Acids, 35, 591-598, (2008)
[73] Zhang, S. W.; Zhang, Y. L.; Yang, H. F.; Zhao, C. H.; Pan, Q., Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies, Amino Acids, 34, 565-572, (2008)
[74] Zhou, X. B.; Chen, C.; Li, Z. C.; Zou, X. Y., Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. Theor. Biol., 248, 546-551, (2007)
[75] Zhu, L.; Yang, J.; Shen, H. B., Multi label learning for prediction of human protein subcellular localizations, Protein J., 28, 384-390, (2009)
[76] Zou, D.; He, Z.; He, J.; Xia, Y., Supersecondary structure prediction using Chou’s pseudo amino acid composition, J. Comput. Chem., 32, 271-278, (2011)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.