×

zbMATH — the first resource for mathematics

Predicting plant protein subcellular multi-localization by Chou’s PseAAC formulation based multi-label homolog knowledge transfer learning. (English) Zbl 1337.92065
J. Theor. Biol. 310, 80-87 (2012); corrigendum ibid. 338, 111 (2013).
Summary: Recent years have witnessed much progress in computational modeling for protein subcellular localization. However, there are far few computational models for predicting plant protein subcellular multi-localization. In this paper, we propose a multi-label multi-kernel transfer learning model for predicting multiple subcellular locations of plant proteins (MLMK-TLM). The method proposes a multi-label confusion matrix and adapts one-against-all multi-class probabilistic outputs to multi-label learning scenario, based on which we further extend our published work MK-TLM (multi-kernel transfer learning based on Chou’s PseAAC formulation for protein submitochondria localization) for plant protein subcellular multi-localization. By proper homolog knowledge transfer, MLMK-TLM is applicable to novel plant protein subcellular localization in multi-label learning scenario. The experiments on plant protein benchmark dataset show that MLMK-TLM outperforms the baseline model. Unlike the existing models, MLMK-TLM also reports its misleading tendency, which is important for comprehensive survey of model’s multi-labeling performance.

MSC:
92C40 Biochemistry, molecular biology
92C80 Plant biology
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Altschul, S.; Madden, T.; Schaffer, A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic acids res., 25, 3389-3402, (1997)
[2] Ashburner, M., Gene ontology: tool for the unification of biology. the gene ontology consortium, Nat. genet., 25, 25-29, (2000)
[3] Blum, T.; Briesemeister, S.; Kohlbacher, O., Multiloc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction, BMC bioinform., 10, 274, (2009)
[4] Boeckmann, B., The SWISS-PROT protein knowledgebase and its supplement trembl, Nucleic acids res., 31, 365-370, (2003)
[5] Chou, K.C., Prediction of protein cellular attributes using pseudo amino acid composition., Proteins: struct. funct. genet. (erratum: ibid., 2001, vol. 44, 60), 43, 246-255, (2001)
[6] Chou, K.C., Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year review), J. theor. biol., 273, 236-247, (2011) · Zbl 1405.92212
[7] Chou, K.C.; Cai, Y.D., Using functional domain composition and support vector machines for prediction of protein subcellular location, J. biol. chem., 277, 4575-45769, (2002)
[8] Chou, K.C.; Elrod, D.W., Protein subcellular location prediction, Protein eng., 12, 107-118, (1999)
[9] Chou, K.C.; Shen, H.B., Hum-ploc: a novel ensemble classifier for predicting human protein subcellular localization, Biochem. biophys. res. commun., 347, 150-157, (2006)
[10] Chou, K.C.; Shen, H.B., Review: recent progresses in protein subcellular location prediction, Anal. biochem., 370, 1-16, (2007)
[11] Chou, K.C.; Shen, H.B., Large-scale plant protein subcellular location prediction, J. cell. biochem., 100, 665-678, (2007)
[12] Chou, K.C.; Shen, H.B., Cell-ploc: a package of web servers for predicting subcellular localization of proteins in various organisms, Nat. protocols, 3, 153-162, (2008)
[13] Chou, K.C.; Shen, H.B., Review: recent advances in developing web-servers for predicting protein attributes, Nat. sci., 2, 63-92, (2009), (Openly accessible from)
[14] Chou, K.C.; Shen, H.B., Cell-ploc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms, Nat. sci., 2, 1090-1103, (2010), (Openly accessible from)
[15] Chou, K.C.; Shen, H.B., Plant-mploc: a top-down strategy to augment the power for predicting plant protein subcellular localization, Plos one, 5, e11335, (2010)
[16] Chou, K.C.; Zhang, C.T., Review: prediction of protein structural classes, Crit. rev. biochem. mol. biol., 30, 275-349, (1995)
[17] Chou, K.C.; Wu, Z.C.; Xiao, X., Iloc-euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins, Plos one, 6, e18258, (2011)
[18] Chou, K.C.; Wu, Z.C.; Xiao, X., Iloc-hum: using accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites, Mol. biosyst., 8, 629-641, (2012)
[19] Dai W., Yang Q., Xue G., Yu Y. Boosting for Transfer Learning. In: Proceedings of the 24th International Conference on Machine Learning, 2007.
[20] Dai, W., Chen, Y., Xue, G., Yang Q., Yu, Y. Translated learning: transfer learning across different feature spaces. In: Proceedings of the NIPS 2008.
[21] Du, P.; Li, Y., Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence, BMC bioinform., 7, 518, (2006)
[22] Du, P.; Cao, S.; Li, Y., Subchlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN), J. theor. biol., 261, 330-335, (2009)
[23] Esmaeili, M.; Mohabatkar, H.; Mohsenzadeh, S., Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses, J. theor. biol., 263, 203-209, (2010)
[24] Georgiou, D.N.; Karakasidis, T.E.; Nieto, J.J.; Torres, A., Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou’s pseudo amino acid composition, J. theor. biol., 257, 17-26, (2009)
[25] Hoglund, A.; Donnes, P.; Blum, T.; Adolph, H.; Kohlbacher, O., Multiloc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition, Bioinformatics, 22, 10, 1158-1165, (2006)
[26] Hu, L.L.; Huang, T.; Cai, Y.D.; Chou, K.C., Prediction of body fluids where proteins are secreted into based on protein interaction network, Plos one, 6, e22989, (2011)
[27] Huang, T.; Chen, L.; Cai, Y.D.; Chou, K.C., Classification and analysis of regulatory pathways using graph property, biochemical and physicochemical property, and functional property, Plos one, 6, e25297, (2011)
[28] Huang, W.; Tunq, C.; Ho, S.; Hwang, S.; Ho, S., Proloc-GO: utilizing informative gene ontology terms for sequence-based prediction of protein subcellular localization, BMC bioinform., 9, 80, (2008)
[29] Huang, W.; Tung, C.; Huang, S.; Ho, S., Predicting protein subnuclear localization using GO-amino-acid composition features, Biosystems, (2009)
[30] Lee, K.; Chuang, H.; Beyer, A.; Sung, M.; Huh, W.; Lee, B.; Ideker, T., Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species, Nucleic acids res., 36, 20, e136, (2008)
[31] Lin, W.Z.; Fang, J.A.; Xiao, X.; Chou, K.C., Idna-prot: identification of DNA binding proteins using random forest with grey model, Plos one, 6, e24756, (2011)
[32] Mei, S., Multi-kernel transfer learning based on Chou’s pseaac formulation for protein submitochondria localization, J. theor. biol., 293, 121-130, (2012) · Zbl 1307.92085
[33] Mei, S.; Wang, Fei, Amino acid classification based spectrum kernel fusion for protein subnuclear localization, BMC bioinform., 11, Suppl. 1, S17, (2010)
[34] Mei, S.; Wang, F.; Zhou, S., Gene ontology based transfer learning for protein subcellular localization, BMC bioinform., 12, 44, (2011)
[35] Mohabatkar, H., Prediction of cyclin proteins using Chou’s pseudo amino acid composition, Protein pept. lett., 17, 1207-1214, (2010)
[36] Nakai, K., Protein sorting signals and prediction of subcellular localization, Adv. protein chem., 54, 277-344, (2000)
[37] Nakashima, H.; Nishikawa, K., Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies, J. mol. biol., 238, 54-61, (1994)
[38] Pan, S.; Yang, Q.A., Survey on transfer learning, IEEE trans. knowl. data eng., 22, 10, 1345-1359, (2010)
[39] Pierleoni, A.; Luigi, P.; Fariselli, P.; Casadio, R., Bacello: a balanced subcellular localization predictor, Bioinformatics, 22, 14, e408-e416, (2006)
[40] Platt, J. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Advances in Large Margin Classifiers. MIT Press, 1999.
[41] Qiu, J.D.; Huang, J.H.; Liang, R.P.; Lu, X.Q., Prediction of G-protein-coupled receptor classes based on the concept of Chou’s pseudo amino acid composition: an approach from discrete wavelet transform, Anal. biochem., 390, 68-73, (2009)
[42] Sahu, S.S.; Panda, G., A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction, Comput. biol. chem., 34, 320-327, (2010)
[43] Shen, H.B.; Chou, K.C., Hum-mploc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites, Biochem. biophys. res. commun., 355, 1006-1011, (2007)
[44] Shen, H.B.; Chou, K.C., Virus-mploc: a fusion classifier for viral protein subcellular location prediction by incorporating multiple sites, J. biomol. struct. dyn., 28, 175-186, (2010)
[45] Tung, T.; Lee, D., A method to improve protein subcellular localization prediction by integrating various biological data sources, BMC bioinform., 10, Suppl. 1, S43, (2009)
[46] Wang, P.; Xiao, X.; Chou, K.C., NR-2L: a two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features, Plos one, 6, e23505, (2011)
[47] Wu, T.; Lin, C.; Weng, R., Probability estimates for multi-class classification by pairwise coupling, J. Mach. learn. res., 5, 975-1005, (2004) · Zbl 1222.68336
[48] Wu, Z.C.; Xiao, X.; Chou, K.C., Iloc-plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites, Mol. biosyst., 7, 3287-3297, (2011)
[49] Wu, Z.C.; Xiao, X.; Chou, K.C., Iloc-gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex Gram-positive bacterial proteins, Protein pept. lett., 19, 4-14, (2012)
[50] Xiao, X.; Wu, Z.C.; Chou, K.C., Iloc-virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites, J. theor. biol., 284, 42-51, (2011)
[51] Xiao, X.; Wu, Z.C.; Chou, K.C., A multi-label classifier for predicting the subcellular localization of Gram-negative bacterial proteins with both single and multiple sites, Plos one, 6, e20592, (2011)
[52] Yang, Q., Chen, Y., Xue, G., Dai, W., Yu, Y. Heterogeneous transfer learning for image clustering via the social Web. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP 2009, pp. 1-9.
[53] Zhou, G.P.; Doctor, K., Subcellular location prediction of apoptosis proteins, Proteins: struct. funct. genet., 50, 44-48, (2003)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.