Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. (English) Zbl 1402.92193

Summary: The submitochondria location of a mitochondrial protein is very important for further understanding the structure and function of this protein. Hence, it is of great practical significance to develop an automated and reliable method for timely identifying the submitochondria locations of novel mitochondrial proteins. In this study, a sequence-based algorithm combining the augmented Chou’s pseudo amino acid composition (Chou’s PseAA) based on auto covariance (AC) is developed to predict protein submitochondria locations and membrane protein types in mitochondria inner membrane. The model fully considers the sequence-order effects between residues a certain distance apart in the sequence by AC combined with eight representative descriptors for both common proteins and membrane proteins. As a result of jackknife cross-validation tests, the method for submitochondria location prediction yields the accuracies of 91.8%, 96.4% and 66.1% for inner membrane, matrix, and outer membrane, respectively. The total accuracy is 89.7%. When predicting membrane protein types in mitochondria inner membrane, the method achieves the prediction performance with the accuracies of 98.4%, 64.3% and 86.7% for multi-pass inner membrane, single-pass inner membrane, and matrix side inner membrane, where the total accuracy is 93.6%. The overall performance of our method is better than the achievements of the previous studies. So our method can be an effective supplementary tool for future proteomics studies. The prediction software and all data sets used in this article are freely available at http://chemlab.scu.edu.cn/Predict_subMITO/index.htm.


92C40 Biochemistry, molecular biology
92C37 Cell biology
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI


[1] Andrade, M.A.; O’Donoghue, S.I.; Rost, B., Adaption of protein surfaces to subcellular location, J. mol. biol., 276, 517-525, (1998)
[2] Bendtsen, J.D.; Nielsen, H.; von Heijne, G.; Brunak, S., Improved prediction of signal peptides: signalp 3.0, J. mol. biol., 340, 783-795, (2004)
[3] Cai, Y.D.; Chou, K.C., Predicting subcellular localization of proteins in a hybridization space, Bioinformatics, 20, 1151-1156, (2004)
[4] Cai, C.Z.; Han, L.Y.; Ji, Z.L.; Chen, X.; Chen, Y.Z., SVM-prot: web-based support vector machine software for functional classification of a protein from its primary sequence, Nucleic acids res., 31, 3692-3697, (2003)
[5] Cai, Y.D.; Feng, K.Y.; Li, Y.X.; Chou, K.C., Support vector machine for predicting α-turn types, Peptides, 24, 629-630, (2003)
[6] Cai, Y.D.; Liu, X.J.; Xu, X.B.; Chou, K.C., Support vector machines for predicting HIV protease cleavage sites in protein, J. comput. chem., 23, 267-274, (2002)
[7] Cai, Y.D.; Liu, X.J.; Xu, X.B.; Chou, K.C., Support vector machines for prediction of protein domain structural class, J. theor. biol., 221, 115-120, (2003)
[8] Cedano, J.; Aloy, P.; Pérez-Pons, J.A.; Querol, E., Relation between amino acid composition and cellular location of proteins, J. mol. biol., 266, 594-600, (1997)
[9] Chen, Y.L.; Li, Q.Z., Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition, J. theor. biol., 248, 377-381, (2007)
[10] Chou, K.C., A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins, J. biol. chem., 268, 16938-16948, (1993)
[11] Chou, K.C., Review: prediction of HIV protease cleavage sites in proteins, Anal. biochem., 233, 1-14, (1996)
[12] Chou, K.C., Prediction of protein subcellular locations by incorporating quasi-sequence-order effect, Biochem. biophys. res. commun., 278, 477-483, (2000)
[13] Chou, K.C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43, 246-255, (2001)
[14] Chou, K.C., Review: structural bioinformatics and its impact to biomedical science, Curr. med. chem., 11, 2105-2134, (2004)
[15] Chou, K.C., Prediction of G-protein-coupled receptor classes, J. proteome res., 4, 1413-1418, (2005)
[16] Chou, K.C.; Cai, Y.D., Using functional domain composition and support vector machines for prediction of protein subcellular location, J. biol. chem., 277, 45765-45769, (2002)
[17] Chou, K.C.; Cai, Y.D., Prediction of membrane protein types by incorporating amphipathic effects, J. chem. inf. model, 45, 407-413, (2005)
[18] Chou, K.C.; Cai, Y.D., Prediction of protease types in a hybridization space, Biochem. biophys. res. commun., 339, 1015-1020, (2006)
[19] Chou, K.C.; Elrod, D.W., Prediction of enzyme family classes, J. proteome res., 2, 183-190, (2003)
[20] Chou, K.C.; Shen, H.B., Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides, Biochem. biophys. res. commun., 357, 633-640, (2007)
[21] Chou, K.C.; Shen, H.B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochem. biophys. res. commun., 360, 339-345, (2007)
[22] Chou, K.C.; Shen, H.B., Review: recent progresses in protein subcellular location prediction, Anal. biochem., 370, 1-16, (2007)
[23] Chou, K.C.; Shen, H.B., Protident: a web server for identifying proteases and their types by fusing functional domain and sequential evolution information, Biochem. biophys. res. commun., 376, 321-325, (2008)
[24] Chou, K.C.; Shen, H.B., Cell-ploc: a package of web-servers for predicting subcellular localization of proteins in various organisms, Nat. protoc., 3, 153-162, (2008)
[25] Chou, K.C.; Zhang, C.T., Review: prediction of protein structural classes, Crit. rev. biochem. mol. biol., 30, 275-349, (1995)
[26] Claros, M.G.; Vincens, P., Computational method to predict mitochondrially imported proteins and their targeting sequences, Eur. J. biochem., 241, 779-786, (1996)
[27] Cotter, D.; Guda, P.; Fahy, E.; Subramaniam, S., Mitoproteome: mitochondrial protein sequence database and annotation system, Nucleic acids res., 32, database issue, D463-D467, (2004)
[28] Degli Esposti, M.; Crimi, M.; Venturoli, G.A., A critical evaluation of the hydropathy profile of membrane proteins, Eur. J. biochem., 190, 207-219, (1990)
[29] Doytchinova, I.A.; Flower, D.R., Vaxijen: a server for prediction of protective antigens, tumour antigens and subunit vaccines, BMC bioinformatics, 8, 4, (2007)
[30] Du, P.F.; Li, Y.D., Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical, BMC bioinformatics, 7, 518, (2006)
[31] Fang, Y.P.; Guo, Y.Z.; Feng, Y.; Li, M.L., Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features, Amino acids, 34, 103-109, (2007)
[32] Grantham, R., Amino acid difference formular to help explain protein evolution, Science, 185, 862-864, (1974)
[33] Guda, C.; Fahy, E.; Subramaniam, S., MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteins, Bioinformatics, 20, 1785-1794, (2004)
[34] Guda, C.; Guda, P.; Fahy, E.; Subramaniam, S., MITOPRED: a web server for the prediction of mitochondrial proteins, Nucleic acids res., 32, 372-374, (2004)
[35] Guo, Y.Z.; Li, M.L.; Lu, M.C.; Wen, Z.N.; Huang, Z.T., Predicting G-protein coupled receptors-G-protein coupling specificity based on autocross-covariance transform, Proteins, 65, 55-60, (2006)
[36] Guo, Y.Z.; Yu, L.Z.; Weng, Z.N.; Li, M.L., Using support vector machine combined with auto covariance to predict protein protein interactions from protein sequences, Nucleic acids res., 36, 3025-3030, (2008)
[37] Hopp, T.P.; Woods, K.R., Prediction of protein antigenic determinants from amino acid sequences, Proc. natl. acad. sci. USA, 78, 3824-3828, (1981)
[38] Jiang, L.; Li, ML.; Wen, Z.N.; Wang, K.L.; Diao, Y.B., Prediction of mitochondrial proteins using discrete wavelet transform, Protein J., 25, 241-249, (2006)
[39] Jiang, X.; Wei, R.; Zhang, T.L.; Gu, Q., Using the concept of Chou’s pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy, Protein peptide lett., 15, 392-396, (2008)
[40] Krigbaum, W.R.; Komoriya, A., Local interactions as a structure determinant for protein molecules: II, Biochim. biophys. acta, 576, 204-228, (1979)
[41] Kumar, M.; Verma, R.; Raghava, G.P.S., Prediction of mitochondrial proteins using support vector machine and hidden Markov model, J. biol. chem., 281, 5357-5363, (2006)
[42] Li, F.M.; Li, Q.Z., Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach, Amino acids, 34, 119-125, (2007)
[43] Li, F.M.; Li, Q.Z., Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach, Protein peptide lett., 15, 612-616, (2008)
[44] Li, W.; Jaroszewski, L.; Godzik, A., Clustering of highly homologous sequence to reduce the size of large protein database, Bioinformatics, 17, 282-283, (2001)
[45] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition, J. theor. biol., 252, 350-356, (2008) · Zbl 1398.92076
[46] Lin, H.; Ding, H.; Feng-Biao Guo, F.B.; Zhang, A.Y.; Huang, J., Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition, Protein peptide lett., 15, 739-744, (2008)
[47] Matthews, B.W., Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochim. biophys. acta, 405, 442-451, (1975)
[48] Mundra, P.; Kumar, M.; Kumar, K.K.; Jayaraman, V.K.; Kulkarni, B.D., Using pseudo amino acid composition to predict protein subnuclear localization: approached with PSSM, Pattern recognit. lett., 28, 1610-1615, (2007)
[49] Nakashima, H.; Nishikawa, K.; Ooi, T., Distinct character in hydrophobicity of amino acid composition of mitochondrial proteins, Proteins, 8, 173-178, (1990)
[50] Nakashima, H.; Nishikawa, K., The amino acid composition is different between the cytoplasmic and extracellular sides in membrane proteins, FEBS lett., 303, 141-146, (1992)
[51] Nanni, L.; Lumini, A., Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization, Amino acids, 34, 653-660, (2008)
[52] Nishikawa, K.; Kubota, Y.; Ooi, T., Classification of proteins into groups based on amino acid composition and other characters. I. angular distribution, J. biochem., 94, 981-995, (1983)
[53] Scharfe, C.; Zaccaria, P.; Hoertnagel, K.; Jaksch, M.; Klopstock, T.; Lill, R.; Prokisch, H.; Gerbitz, K.D.; Mewes, H.W.; Meitinger, T., MITOP, the mitochondrial proteome database: 2000 update, Nucleic acids res., 28, 155-158, (2000)
[54] Shen, H.B.; Chou, K.C., Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition, Biochem. biophys. res. commun., 337, 752-756, (2005)
[55] Shen, H.B.; Chou, K.C., Signal-3L: a 3-layer approach for predicting signal peptide, Biochem. biophys. res. commun., 363, 297-303, (2007)
[56] Shen, H.B.; Chou, K.C., Ezypred: a top-down approach for predicting enzyme functional classes and subclasses, Biochem. biophys. res. commun., 364, 53-59, (2007)
[57] Shen, H.B.; Chou, K.C., Hivcleave: a web-server for predicting HIV protease cleavage sites in proteins, Anal. biochem., 375, 388-390, (2008)
[58] Shen, H.B.; Chou, K.C., Identification of proteases and their types, Anal. biochem., 385, 153-160, (2009)
[59] Shen, H.B.; Yang, J.; Chou, K.C., Euk-ploc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction, Amino acids, 33, 57-67, (2007)
[60] Tan, F.Y.; Feng, X.Y.; Fang, Z.; Li, M.L.; Guo, Y.Z.; Jiang, L., Prediction of mitochondrial proteins based on genetic algorithm—partial least squares and support vector machine, Amino acids, 33, 669-675, (2007)
[61] Tanford, C., Contribution of hydrophobic interactions to the stability of the globular conformation of proteins, J. am. chem. soc., 84, 4240-4274, (1962)
[62] Vapnik, V., Statistical learning theory, (1998), Wiley-Interscience New York · Zbl 0935.62007
[63] Wen, Z.N.; Li, M.L.; Li, Y.Z.; Guo, Y.Z.; Wang, K.L., Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition, Amino acids, 32, 277-283, (2007)
[64] Wold, S.; Jonsson, J.; Sjöström, M.; Sandberg, M.; Rännar, S., DNA and peptide sequences and chemical processes mutlivariately modelled by principal component analysis and partial least-squares projections to latent structures, Anal. chim. acta, 277, 239-253, (1993)
[65] Xiao, X.; Shao, S.; Ding, Y.; Huang, Z.; Chou, K.C., Using cellular automata images and pseudo amino acid composition to predict protein subcellular location, Amino acids, 30, 49-54, (2006)
[66] Xiao, X., Wang, P., Chou, K.C., 2008. GPCR-CA: A cellular automaton image approach for predicting G-protein-coupled receptor functional classes. J. Comput. Chem. doi:10.1002/jcc.21163.
[67] Zhang, G.Y.; Fang, B.S., Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou’s amphiphilic pseudo amino acid composition, J. theor. biol., 253, 310-315, (2008)
[68] Zhang, S.W.; Zhang, Y.L.; Yang, H.F.; Zhao, C.H.; Pan, Q., Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies, Amino acids, 34, 565-572, (2008)
[69] Zhang, G.Y.; Li, H.C.; Fang, B.S., Predicting lipase types by improved Chou’s pseudo-amino acid composition, Protein peptide lett., 15, 1132-1137, (2008)
[70] Zhou, X.B.; Chen, C.; Li, Z.C.; Zou, X.Y., Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. theor. biol., 248, 546-551, (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.