SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic \(K\)-nearest neighbor (ET-KNN) algorithm. (English) Zbl 1403.92063

Summary: The chloroplast is a type of plant specific subcellular organelle. It is of central importance in several biological processes like photosynthesis and amino acid biosynthesis. Thus, understanding the function of chloroplast proteins is of significant value. Since the function of chloroplast proteins correlates with their subchloroplast locations, the knowledge of their subchloroplast locations can be very helpful in understanding their role in the biological processes. In the current paper, by introducing the evidence-theoretic \(K\)-nearest neighbor (ET-KNN) algorithm, we developed a method for predicting the protein subchloroplast locations. This is the first algorithm for predicting the protein subchloroplast locations. We have implemented our algorithm as an online service, SubChlo (http://bioinfo.au.tsinghua.edu.cn/subchlo). This service may be useful to the chloroplast proteome research.


92C40 Biochemistry, molecular biology
92C80 Plant biology
68W05 Nonnumerical algorithms
Full Text: DOI


[1] Bhasin, M.; Raghava, G.P., Eslpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST, Nucleic acids res., 32, W414-W419, (2004)
[2] Chen, C.; Zhou, X.; Tian, Y.; Zou, X.; Cai, P., Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network, Anal. biochem., 357, 116-121, (2006)
[3] Chou, K.C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43, 246-255, (2001)
[4] Chou, K.C., Structural bioinformatics and its impact to biomedical science, Curr. med. chem., 11, 2105-2134, (2004)
[5] Chou, K.C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[6] Chou, K.C.; Cai, Y.D., Prediction of membrane protein types by incorporating amphipathic effects, J. chem. inf. model., 45, 407-413, (2005)
[7] Chou, K.C.; Shen, H.B., Predicting protein subcellular location by fusing multiple classifiers, J. cell biochem., 99, 517-527, (2006)
[8] Chou, K.C.; Shen, H.B., Hum-ploc: a novel ensemble classifier for predicting human protein subcellular localization, Biochem. biophys. res. commun., 347, 150-157, (2006)
[9] Chou, K.C.; Shen, H.B., Recent progress in protein subcellular location prediction, Anal. biochem., 370, 1-16, (2007)
[10] Chou, K.C.; Shen, H.B., Large-scale plant protein subcellular location prediction, J. cell biochem., 100, 665-678, (2007)
[11] Chou, K.C.; Shen, H.B., Euk-mploc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites, J. proteome res., 6, 1728-1734, (2007)
[12] Chou, K.C.; Shen, H.B., Cell-ploc: a package of web servers for predicting subcellular localization of proteins in various organisms, Nat. protoc., 3, 153-162, (2008)
[13] Cui, Q.; Jiang, T.; Liu, B.; Ma, S., Esub8: a novel tool to predict protein subcellular localizations in eukaryotic organisms, BMC bioinf., 5, 66, (2004)
[14] Denoeux, T., A k-nearest neighbor classification rule based on dempster – shafer theory, IEEE trans. syst. man cybern., 25, 804-813, (1995)
[15] Du, P.; Li, Y., Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence, BMC bioinf., 7, 518, (2006)
[16] Emanuelsson, O.; Nielsen, H.; von Heijne, G., Chlorop, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites, Protein sci., 8, 978-984, (1999)
[17] Fang, Y.; Guo, Y.; Feng, Y.; Li, M., Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features, Amino acids, 34, 103-109, (2008)
[18] Feng, Z.P., An overview on predicting the subcellular location of a protein, In silico biol., 2, 291-303, (2002)
[19] Ferro, M.; Salvi, D.; Riviere-Rolland, H.; Vermat, T.; Seigneurin-Berny, D.; Grunwald, D.; Garin, J.; Joyard, J.; Rolland, N., Integral membrane proteins of the chloroplast envelope: identification and subcellular localization of new transporters, Proc. natl. acad. sci. USA, 99, 11487-11492, (2002)
[20] Ferro, M.; Salvi, D.; Brugiere, S.; Miras, S.; Kowalski, S.; Louwagie, M.; Garin, J.; Joyard, J.; Rolland, N., Proteomics of the chloroplast envelope membranes from arabidopsis thaliana, Mol. cell proteomics, 2, 325-345, (2003)
[21] Guo, J.; Pu, X.; Lin, Y.; Leung, H., Protein subcellular localization based on PSI-BLAST and machine learning, J. bioinf. comput. biol., 4, 1181-1195, (2006)
[22] Guo, X.; Gao, X., A novel hierarchical ensemble classifier for protein fold recognition, Protein eng. des. sel., 21, 659-664, (2008)
[23] Horton, P.; Park, K.J.; Obayashi, T.; Fujita, N.; Harada, H.; Adams-Collier, C.J.; Nakai, K., Wolf PSORT: protein localization predictor, Nucleic acids res., 35, W585-W587, (2007)
[24] Hua, S.; Sun, Z., Support vector machine approach for protein subcellular localization prediction, Bioinformatics, 17, 721-728, (2001)
[25] Huang, W.L.; Tung, C.W.; Ho, S.W.; Hwang, S.F.; Ho, S.Y., Proloc-GO: utilizing informative gene ontology terms for sequence-based prediction of protein subcellular localization, BMC bioinf., 9, 80, (2008)
[26] Huang, Y.; Li, Y., Prediction of protein subcellular locations using fuzzy k-NN method, Bioinformatics, 20, 21-28, (2004)
[27] Kleffmann, T.; Hirsch-Hoffmann, M.; Gruissem, W.; Baginsky, S., Plprot: a comprehensive proteome database for different plastid types, Plant cell physiol., 47, 432-436, (2006)
[28] Kleffmann, T.; Russenberger, D.; von Zychlinski, A.; Christopher, W.; Sjolander, K.; Gruissem, W.; Baginsky, S., The arabidopsis thaliana chloroplast proteome reveals pathway abundance and novel protein functions, Curr. biol., 14, 354-362, (2004)
[29] Lei, Z.; Dai, Y., An SVM-based system for predicting protein subnuclear localizations, BMC bioinf., 6, 291, (2005)
[30] Lei, Z.; Dai, Y., Assessing protein similarity with gene ontology and its use in subnuclear localization prediction, BMC bioinf., 7, 491, (2006)
[31] Li, W.; Godzik, A., Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 1658-1659, (2006)
[32] Liu, H.; Wang, M.; Chou, K.C., Low-frequency Fourier spectrum for predicting membrane protein types, Biochem. biophys. res. commun., 336, 737-739, (2005)
[33] Matsuda, S.; Vert, J.P.; Saigo, H.; Ueda, N.; Toh, H.; Akutsu, T., A novel representation of protein sequences for prediction of subcellular location using support vector machines, Protein sci., 14, 2804-2813, (2005)
[34] Nanni, L.; Lumini, A., An ensemble of K-local hyperplanes for predicting protein – protein interactions, Bioinformatics, 22, 1207-1210, (2006)
[35] Nanni, L.; Lumini, A., Ensemblator: an ensemble of classifiers for reliable classification of biological data, Pattern recognition lett., 28, 622-630, (2007)
[36] Nanni, L.; Lumini, A., Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization, Amino acids, 34, 653-660, (2008)
[37] Nanni, L.; Lumini, A., Using ensemble of classifiers in bioinformatics, ()
[38] Nanni, L.; Lumini, A., Particle swarm optimization for ensembling generation for evidential k-nearest-neighbour classifier, Neural comput. appl., 18, 105-108, (2009)
[39] Peltier, J.B.; Friso, G.; Kalume, D.E.; Roepstorff, P.; Nilsson, F.; Adamska, I.; van Wijk, K.J., Proteomics of the chloroplast: systematic identification and targeting analysis of lumenal and peripheral thylakoid proteins, Plant cell, 12, 319-341, (2000)
[40] Peltier, J.B.; Emanuelsson, O.; Kalume, D.E.; Ytterberg, J.; Friso, G.; Rudella, A.; Liberles, D.A.; Soderberg, L.; Roepstorff, P.; von Heijne, G.; van Wijk, K.J., Central functions of the lumenal and peripheral thylakoid proteome of arabidopsis determined by experimentation and genome-wide prediction, Plant cell, 14, 211-236, (2002)
[41] Reinhardt, A.; Hubbard, T., Using neural networks for prediction of the subcellular location of proteins, Nucleic acids res., 26, 2230-2236, (1998)
[42] Shafer, G., A mathematical theory of evidence, (1976), Princeton University Press Princeton, NJ · Zbl 0359.62002
[43] Shen, H.B.; Chou, K.C., Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition, Biochem. biophys. res. commun., 337, 752-756, (2005)
[44] Shen, H.B.; Chou, K.C., Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types, Biochem. biophys. res. commun., 334, 288-292, (2005)
[45] Shen, H.B.; Chou, K.C., Ensemble classifier for protein fold pattern recognition, Bioinformatics, 22, 1717-1722, (2006)
[46] Shen, H.B.; Chou, K.C., Hum-mploc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites, Biochem. biophys. res. commun., 355, 1006-1011, (2007)
[47] Shen, H.B.; Chou, K.C., Nuc-ploc: a new web-server for predicting protein subnuclear localization by fusing pseaa composition and psepssm, Protein eng. des. sel., 20, 561-567, (2007)
[48] Shen, H.B.; Chou, K.C., Pseaac: a flexible web server for generating various kinds of protein pseudo amino acid composition, Anal. biochem., 373, 386-388, (2008)
[49] Shen, H.B.; Yang, J.; Chou, K.C., Euk-ploc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction, Amino acids, 33, 57-67, (2007)
[50] Szafron, D.; Lu, P.; Greiner, R.; Wishart, D.S.; Poulin, B.; Eisner, R.; Lu, Z.; Anvik, J.; Macdonell, C.; Fyshe, A.; Meeuwis, D., Proteome analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations, Nucleic acids res., 32, W365-W371, (2004)
[51] Tamura, T.; Akutsu, T., Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition, BMC bioinf., 8, 466, (2007)
[52] The universal protein resource (uniprot) 2009, Nucl. acids res., 37, D169-D174, (2009)
[53] Xie, D.; Li, A.; Wang, M.; Fan, Z.; Feng, H., LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST, Nucleic acids res., 33, W105-W110, (2005)
[54] Yu, C.S.; Chen, Y.C.; Lu, C.H.; Hwang, J.K., Prediction of protein subcellular localization, Proteins, 64, 643-651, (2006)
[55] Yuan, Z., Prediction of protein subcellular locations using Markov chain models, FEBS lett., 451, 23-26, (1999)
[56] Zeng, Y.-H.; Guo, Y.-Z.; Xiao, R.-Q.; Yang, L.; Yu, L.-Z.; Li, M.-L., Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach, J. theor. biol., 259, 366-372, (2009) · Zbl 1402.92193
[57] Zhang, G.Y.; Fang, B.S., Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou’s amphiphilic pseudo-amino acid composition, J. theor. biol., 253, 310-315, (2008)
[58] Zhou, X.B.; Chen, C.; Li, Z.C.; Zou, X.Y., Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. theor. biol., 248, 546-551, (2007)
[59] Zouhal, L.M.; Denoeux, T., An evidence-theoretic k-NN rule with parameter optimization, IEEE trans. syst. man cybern., 28, 263-271, (1998)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.