×

Multi-kernel transfer learning based on Chou’s PseAAC formulation for protein submitochondria localization. (English) Zbl 1307.92085

J. Theor. Biol. 293, 121-130 (2012); corrigendum ibid. 338, 112 (2013).
Summary: Protein sub-organelle localization, e.g. submitochondria, seems more challenging than general protein subcellular localization, because the determination of protein’s micro-level localization within organelle by fluorescent imaging technique would face up with more difficulties. Up to present, there are far few computational methods for protein submitochondria localization, and the existing sequence-based predictive models demonstrate moderate or unsatisfactory performance. Recent researches have demonstrated that gene ontology (GO) is a convincingly effective protein feature for protein subcellular localization. However, the GO information may not be available for novel proteins or sparsely annotated protein subfamilies. In allusion to the problem, we transfer the homology’s GO information to the target protein and propose a multi-kernel transfer learning model for protein submitochondria localization (MK-TLM), which substantially extends our previously published work (gene ontology based transfer learning model for protein subcellular localization, GO-TLM). To reduce the risk of performance overestimation, we conduct a more comprehensive survey of the model performance in optimistic case, moderate case and pessimistic case according to the abundance of target protein’s GO information. The experiments on submitochondria benchmark datasets show that MK-TLM significantly outperforms the baseline models, and demonstrates excellent performance for novel mitochondria proteins and those mitochondria proteins that belong to the subfamily we know little about.

MSC:

92C40 Biochemistry, molecular biology
92D20 Protein sequences, DNA sequences
68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Altschul, S.; Madden, T.; Schaffer, A.; Zhang, J.; Zhang, Z.; Miller, W.; BLAST, Lipman D. Gapped; PSI-BLAST, A., New generation of protein database search programs, Nucleic Acids Res., 25, 3389-3402 (1997)
[2] Blum, T.; Briesemeister, S.; Kohlbacher, O., MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction, BMC Bioinformatics, 10, 274 (2009)
[3] Boeckmann, B., The SWISS-PROT protein knowledgebase and its supplement TrEMBL, Nucleic Acids Res., vol. 31, 365-370 (2003)
[4] Chen, C.; Chen, L.; Zou, X.; Cai, P., Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine, Protein Pept. Lett., 16, 27-31 (2009)
[5] Chou, K. C., Prediction of protein cellular attributes using pseudo amino acid composition, Proteins: Struct. Funct. Genet., 43, 246-255 (2001), (Erratum: ibid., 2001, vol. 44, 60)
[6] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19 (2005)
[7] Chou, K. C., Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology, Curr. Proteomics, 6, 262-274 (2009)
[8] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition (50th Anniversary Year Review), J. Theor. Biol., 273, 236-247 (2011) · Zbl 1405.92212
[9] Chou, K. C.; Cai, Y., A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology, Biochem. Biophys. Res. Commun., 311, 743-747 (2003)
[10] Chou, K. C.; Cai, Y., Prediction of protein subcellular locations by GO-FunD-PseAA predictor, Biochem. Biophys. Res. Commun., 320, 1236-1239 (2004)
[11] Chou, K. C.; Shen, H. B., Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization, Biochem. Biophys. Res. Commun., 347, 150-157 (2006)
[12] Chou, K. C.; Shen, H. B., Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites, J. Proteome Res., 6, 1728-1734 (2007)
[13] Chou, K. C.; Shen, H. B., Large-scale plant protein subcel-lular location prediction, J. Cell. Biochem., 100, 665-678 (2007)
[14] Chou, K. C.; Shen, H. B., Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms (updated version: cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms, Natural Science, 2010, 2, 1090-1103), Nat. Protocols, 3, 153-162 (2008)
[15] Chou, K. C.; Shen, H. B., Review: recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 2, 63-92 (2009), (Accessible at: 〈http://www.scirp.org/journal/NS/〉)
[16] Chou, K. C.; Shen, H. B., A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites EukmPLoc 2.0, PLoS ONE, 5, e9931 (2010)
[17] Chou, K. C.; Shen, H. B., Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms, Nat. Sci., 2, 1090-1103 (2010), (Accessible at: 〈http://www.scirp.org/journal/NS/〉)
[18] Chou, K. C.; Shen, H. B., Plant-mPLoc: a Top-down strategy to augment the power for predicting plant protein subcellular localization, PLoS ONE, 5, e11335 (2010)
[19] Chou, K. C.; Zhang, C. T., Review: prediction of protein structural classes, Crit. Rev. Biochem. Mol. Biol., 30, 275-349 (1995)
[20] Chou, K. C.; Wu, Z. C.; Xiao, X.; iLoc-Euk, A., Multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins, PLoS One, 6, e18258 (2011)
[21] Dai, W., Yang, Q., Xue, G., Yu, Y., 2007. Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine Learning.; Dai, W., Yang, Q., Xue, G., Yu, Y., 2007. Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine Learning.
[22] Dai, W., Chen, Y., Xue, G., Yang, Q., Yu, Y., 2008 Translated Learning: Transfer Learning across Different Feature Spaces. In: Proceedings of the NIPS .; Dai, W., Chen, Y., Xue, G., Yang, Q., Yu, Y., 2008 Translated Learning: Transfer Learning across Different Feature Spaces. In: Proceedings of the NIPS .
[23] Ding, H.; Luo, L.; Lin, H., Prediction of cell wall lytic enzymes using Chou’s amphiphilic pseudo amino acid composition, Protein Pept. Lett., 16, 351-355 (2009), 6
[24] Ding, H.; Liu, L.; Guo, F. B.; Huang, J.; Lin, H., Identify Golgi protein types with modified mahalanobis discriminant algorithm and pseudo amino acid composition, Protein Pept. Lett., 18, 58-63 (2011)
[25] Ding, Y. S.; Zhang, T. L., Using Chou’s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier, Pattern Recognition Lett., 29, 1887-1892 (2008)
[26] Du, P.; Li, Y., Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence, BMC Bioinformatics, 7, 518 (2006)
[27] Elstner, M.; Andreoli, C.; Ahting, U.; Tetkol, I.; Klopstock, T.; Meitinger, T.; Prokisch, H., MitoP2: an integrative tool for the analysis of the mitochondrial proteome, Mol. Biotechnol., 40, 306-315 (2008)
[28] Esmaeili, M.; Mohabatkar, H.; Mohsenzadeh, S., Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses, J. Theor. Biol., 263, 203-209 (2010) · Zbl 1406.92455
[29] Fang, Y.; Guo, Y.; Feng, Y.; Li, M., Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features, Amino Acids, 34, 103-109 (2008)
[30] Georgiou, D. N.; Karakasidis, T. E.; Nieto, J. J.; Torres, A., Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou’s pseudo amino acid composition, J. Theor. Biol., 257, 17-26 (2009) · Zbl 1400.92393
[31] Gottlieb, R. A., Programmed cell death, Drug News Perspect., 13, 471-476 (2000)
[32] Gu, Q.; Ding, Y. S.; Zhang, T. L., Prediction of g-protein-coupled receptor classes in low homology using Chou’s pseudo amino acid composition with approximate entropy and hydrophobicity patterns, Protein Pept. Lett., 17, 559-567 (2010)
[33] Guo, J.; Rao, N.; Liu, G.; Yang, Y.; Wang, G., Predicting protein folding rates using the concept of Chou’s pseudo amino acid composition, J. Comput. Chem., 32, 1612-1617 (2011)
[34] He, Z.; Zhang, J.; Shi, X. H.; Hu, L. L.; Kong, X.; Cai, Y. D.; Chou, K. C., Predicting drug-target interaction networks based on functional groups and biological features, PLoS ONE, 5, e9603 (2010)
[35] Hu, L.; Zheng, L.; Wang, Z.; Li, B.; Liu, L., Using pseudo amino Acid composition to predict protease families by incorporating a series of protein biological features, Protein Pept. Lett., 18, 552-558 (2011)
[36] Hu, L.; Huang, T.; Shi, X.; Lu, W. C.; Cai, Y. D.; Chou, K. C., Predicting functions of proteins in mouse based on weighted protein-protein interaction network and protein hybrid properties, PLoS ONE, 6, e14556 (2011)
[37] Hu, L. L.; Huang, T.; Cai, Y. D.; Chou, K. C., Prediction of body fluids where proteins are secreted into based on protein interaction network, PLoS One, 6, e22989 (2011)
[38] Huang, T.; Shi, X. H.; Wang, P.; He, Z.; Feng, K. Y.; Hu, L.; Kong, X.; Li, Y. X.; Cai, Y. D.; Chou, K. C., Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks, PLoS ONE, 5, e10972 (2010)
[39] Huang, T.; Niu, S.; Xu, Z.; Huang, Y.; Kong, X.; Cai, Y. D.; Chou, K. C., Predicting transcriptional activity of multiple site p53 mutants based on hybrid properties, PLoS One, 6, e22940 (2011)
[40] Huang, W.; Tunq, C.; Ho, S.; Hwang, S.; Ho, S., ProLoc-GO: utilizing informative gene ontology terms for sequence-based prediction of protein subcellular localization, BMC Bioinformatics, 9, 80 (2008)
[41] Huang, W.; Tung, C.; Huang, S.; Ho, S., Predicting protein subnuclear localization using GO-amino-acid composition features, BioSystems (2009)
[42] Jassem, W.; Fuggle, S. V.; Rela, M.; Koo, D. D., ND H. The role of mitochondria in ischemia/reperfusion injury, Transplantation, 73, 493-499 (2000)
[43] Jiang, X.; Wei, R.; Zhang, T. L.; Gu, Q., Using the concept of Chou’s pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy, Protein Pept. Lett., 15, 392-396 (2008)
[44] Jiang, X.; Wei, R.; Zhao, Y.; Zhang, T., Using Chou’s pseudo amino acid composition based on approximate entropy and an ensemble of AdaBoost classifiers to predict protein subnuclear location, Amino Acids, 34, 669-675 (2008)
[45] Kandaswamy, K. K.; Pugalenthi, G.; Moller, S.; Hartmann, E.; Kalies, K. U.; Suganthan, P. N.; Martinetz, T., Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition, Protein Pept. Lett., 17, 1473-1479 (2010)
[46] Lee, K.; Chuang, H.; Beyer, A.; Sung, M.; Huh, W.; Lee, B.; Ideker, T., Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species, Nucleic Acids Res., 36, 20, e136 (2008)
[47] Lei, Z.; Dai, Y., Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction, BMC Bioinformatics, 7, 491 (2006)
[48] Ashburner, Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nat. Genet., 25, 25-29 (2000)
[49] Li, F. M.; Li, Q. Z., Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach, Protein Pept. Lett., 15, 612-616 (2008)
[50] Li, Z. C.; Zhou, X. B.; Dai, Z.; Zou, X. Y., Prediction of protein structural classes by Chou’s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis, Amino Acids, 37, 415-425 (2009)
[51] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition, J. Theor. Biol., 252, 350-356 (2008) · Zbl 1398.92076
[52] Lin, H.; Wang, H.; Ding, H.; Chen, Y. L.; Li, Q. Z., Prediction of subcellular localization of apoptosis protein using Chou’s pseudo amino acid composition, Acta Biotheor., 57, 321-330 (2009)
[53] Lin, M.; Beal, M., Mitochondrial dysfunction and oxidative stress in neurodegenerative diseases, Nature, 443, 787-795 (2006)
[54] Lowell, B.; Shulman, G., Mitochondrial dysfunction and type 2 diabetes, Science, 307, 384-387 (2005)
[55] Mei, S.; Wang, F.; Zhou, S., Gene ontology based transfer learning for protein subcellular localization, BMC Bioinformatics, 12, 44 (2011)
[56] Mohabatkar, H., Prediction of cyclin proteins using Chou’s pseudo amino acid composition, Protein Pept. Lett., 17, 1207-1214 (2010)
[57] Mohabatkar, H.; Mohammad Beigi, M.; Esmaeili, A., Prediction of GABA(A) receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machine, J. Theor. Biol., 281, 18-23 (2011) · Zbl 1397.92215
[58] Nanni, L.; Lumini, A., Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization, Amino Acids, 34, 653-660 (2008)
[59] Nanni, L.; Lumini, A.; Gupta, D.; Garg, A., Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou’s pseudo amino acid composition and on evolutionary information, IEEE/ACM Trans. Comput. Biol. Bioinform. (2011)
[60] Pan, S.; Yang, Q. A., Survey on transfer learning, IEEE Trans. Knowl. Data Eng., 22, 10, 1345-1359 (2010)
[61] Qiu, J. D.; Huang, J. H.; Liang, R. P.; Lu, X. Q., Prediction of G-protein-coupled receptor classes based on the concept of Chou’s pseudo amino acid composition: an approach from discrete wavelet transform, Anal. Biochem., 390, 68-73 (2009)
[62] Qiu, J. D.; Huang, J. H.; Shi, S. P.; Liang, R. P., Using the concept of Chou’s pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform, Protein Pept. Lett., 17, 715-722 (2010)
[63] Qiu, J. D.; Suo, S. B.; Sun, X. Y.; Shi, S. P.; Liang, R. P., OligoPred: a webserver for predicting homo-oligomeric proteins by incorporating discrete wavelet transform into Chou’s pseudo amino acid composition, J. Mol. Graph. Model., 30, 129-134 (2011)
[64] Sahu, S. S.; Panda, G., A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction, Comput. Biol. Chem., 34, 320-327 (2010) · Zbl 1403.92221
[65] Shen, H. B.; Chou, K. C., Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites, Biochem. Biophys. Res. Commun., 355, 1006-1011 (2007)
[66] Shen, H. B.; Yanq, J.; Chou, K. C., Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction, Amino Acids, 33, 57-67 (2007)
[67] Shen, H. B.; Chou, K. C.; Virus-mPLoc, A., Fusion classifier for viral protein subcellular location prediction by in-corporating multiple sites, J. Biomol. Struct. Dyn., 28, 2 (2010), ISSN:0739-1102
[68] Shi, S., Identify submitochondria and subchloroplast locations with pseudo amino acid composition: Approach from the strategy of discrete wavelet transform feature extraction, Biochim. Biophys. Acta, 2011, 424-430 (1813)
[69] Tung, T.; Lee, D., A method to improve protein subcellu-lar localization prediction by integrating various biological data sources, BMC Bioinformatics, 10, Suppl 1, S43 (2009)
[70] Wang, P.; Hu, L.; Liu, G.; Jiang, N.; Chen, X.; Xu, J.; Zheng, W.; Li, L.; Tan, M.; Chen, Z.; Song, H.; Cai, Y. D.; Chou, K. C., Prediction of antimicrobial peptides based on sequence alignment and feature selection methods, PLoS One, 6, e18476 (2011)
[71] Wang, P.; Xiao, X.; Chou, K. C.; NR-2L, A., Two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features, PLoS One, 6, e23505 (2011)
[72] Wang, Y. C.; Wang, X. B.; Yang, Z. X.; Deng, N. Y., Prediction of enzyme subfamily class via pseudo amino acid composition by incorporating the conjoint triad feature, Protein Pept. Lett., 17, 1441-1449 (2010)
[73] Xiao, X.; Wu, Z. C.; Chou, K. C., iLoc-Virus: A multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites, J. Theor. Biol., 284, 42-51 (2011) · Zbl 1397.92238
[74] Xiao, X.; Wu, Z. C.; Chou, K. C., A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites, PLoS One, 6, e20592 (2011)
[75] Xiao, X.; Wang, P.; Chou, K. C., GPCR-2L: Predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions, Molecular Biosystems, 7, 911-919 (2011)
[76] Yang, Q., Chen, Y., Xue, G., Dai, W., Yu, Y., 2009. Heterogeneous transfer learning for image clustering via the social web. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 1-9.; Yang, Q., Chen, Y., Xue, G., Dai, W., Yu, Y., 2009. Heterogeneous transfer learning for image clustering via the social web. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 1-9.
[77] Yu, L.; Guo, Y.; Li, Y.; Li, G.; Li, M.; Luo, J.; Xiong, W.; Qin, W., SecretP: Identifying bacterial secreted proteins by fusing new features into Chou’s pseudo amino acid composition, J. Theor. Biol., 267, 1-6 (2010) · Zbl 1410.92040
[78] Zeng, Y. H.; Guo, Y. Z.; Xiao, R. Q.; Yang, L.; Yu, L. Z.; Li, M. L., Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach, J. Theor. Biol., 259, 366-372 (2009) · Zbl 1402.92193
[79] Zhang, G. Y.; Fang, B. S., Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou’s amphiphilic pseudo amino acid composition, J. Theor. Biol., 253, 310-315 (2008)
[80] Zhang, G. Y.; Li, H. C.; Gao, J. Q.; Fang, B. S., Predicting lipase types by improved Chou’s pseudo-amino acid composition, Protein Pept. Lett., 15, 1132-1137 (2008)
[81] Zhang, S. W.; Zhang, Y. L.; Yang, H. F.; Zhao, C. H.; Pan, Q., Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies, Amino Acids, 34, 565-572 (2008)
[82] Zhou, X. B.; Chen, C.; Li, Z. C.; Zou, X. Y., Using Chou’s amphiphilic pseudo amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. Theor. Biol., 248, 546-551 (2007) · Zbl 1451.92245
[83] Zou, D.; He, Z.; He, J.; Xia, Y., Supersecondary structure prediction using Chou’s pseudo amino acid composition, J. Comput. Chem., 32, 271-278 (2011)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.