×

Multiclass classification of sarcomas using pathway based feature selection method. (English) Zbl 1307.92285

Summary: Feature selection is an important research topic in bioinformatics, to date a large number of methods have been developed. Recently several pathway based feature selection protocols, such as the condition-responsive genes method, have been proposed for better classification performance. However, these conventional pathway based methods may lead to the selection of relevant but redundant genes in a given pathway while missing the other useful genes. Also these methods were limited to binary classification, while in many clinical problems a multiclass protocol is preferred such as the classification of sarcomas. Here, we propose a new pathway based feature selection method named redundancy removable pathway based feature selection method (RRP) for the binary and multiclass classification problems. Three classifiers were implemented to compare the performance and gene functions of gene-based, conventional pathway based, and our RRP method. The validation results suggest that the RRP method is a feasible and robust feature selection method for multi-class prediction problems.

MSC:

92D10 Genetics and epigenetics
92C50 Medical applications (general)

Software:

affy; Bioconductor
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Alon, U.; Barkai, N.; Notterman, D. A.; Gish, K.; Ybarra, S.; Mack, D.; Levine, A. J., Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Natl. Acad. Sci. USA, 96, 6745-6750 (1999)
[2] Annest, A.; Bumgarner, R. E.; Raftery, A. E.; Yeung, K. Y., Iterative Bayesian model averaging: a method for the application of survival analysis to high-dimensional microarray data, BMC Bioinform., 10, 72 (2009)
[3] Bhardwaj, N.; Langlois, R. E.; Zhao, G.; Lu, H., Kernel-based machine learning protocol for predicting DNA-binding proteins, Nucl. Acids Res., 33, 6486-6493 (2005)
[4] Bild, A. H.; Yao, G.; Chang, J. T.; Wang, Q.; Potti, A.; Chasse, D.; Joshi, M. B.; Harpole, D.; Lancaster, J. M.; Berchuck, A.; Olson, J. A.; Marks, J. R.; Dressman, H. K.; West, M.; Nevins, J. R., Oncogenic pathway signatures in human cancers as a guide to targeted therapies, Nature, 439, 353-357 (2006)
[5] Boulesteix, A. L., PLS dimension reduction for classification with microarray data, Stat. Appl. Genet. Mol. Biol., 3, 1-32 (2004), (Article33) · Zbl 1086.62119
[6] Chuang, H. Y.; Lee, E.; Liu, Y. T.; Lee, D.; Ideker, T., Network-based classification of breast cancer metastasis, Mol. Syst. Biol., 3, 140 (2007)
[7] Coindre, J. M.; Terrier, P.; Guillou, L.; Le Doussal, V.; Collin, F.; Ranchere, D.; Sastre, X.; Vilain, M. O.; Bonichon, F.; N’Guyen Bui, B., Predictive value of grade for metastasis development in the main histologic types of adult soft tissue sarcomas: a study of 1240 patients from the French federation of cancer centers sarcoma group, Cancer, 91, 1914-1926 (2001), ([pii])
[8] Crew, A. J.; Clark, J.; Fisher, C.; Gill, S.; Grimer, R.; Chand, A.; Shipley, J.; Gusterson, B. A.; Cooper, C. S., Fusion of SYT to two genes, SSX1 and SSX2, encoding proteins with homology to the Kruppel-associated box in human synovial sarcoma, EMBO J., 14, 2333-2340 (1995)
[9] Detwiller, K. Y.; Fernando, N. T.; Segal, N. H.; Ryeom, S. W.; D׳Amore, P. A.; Yoon, S. S., Analysis of hypoxia-related gene expression in sarcomas and effect of hypoxia on RNA interference of vascular endothelial cell growth factor A, Cancer Res., 65, 5881-5889 (2005)
[10] Ding, C.; Peng, H., Minimum redundancy feature selection from microarray gene expression data, J. Bioinform. Comput. Biol., 3, 185-205 (2005), (S0219720005001004 [pii])
[11] Fan, X.; Shao, L.; Fang, H.; Tong, W.; Cheng, Y., Cross-platform comparison of microarray-based multiple-class prediction, PLoS One, 6, e16067 (2011)
[12] Gautier, L.; Cope, L.; Bolstad, B. M.; Irizarry, R. A., affy-analysis of Affymetrix GeneChip data at the probe level, Bioinformatics, 20, 307-315 (2004)
[13] Gentleman, R. C.; Carey, V. J.; Bates, D. M.; Bolstad, B.; Dettling, M.; Dudoit, S.; Ellis, B.; Gautier, L.; Ge, Y.; Gentry, J.; Hornik, K.; Hothorn, T.; Huber, W.; Iacus, S.; Irizarry, R.; Leisch, F.; Li, C.; Maechler, M.; Rossini, A. J.; Sawitzki, G.; Smith, C.; Smyth, G.; Tierney, L.; Yang, J. Y.; Zhang, J., Bioconductor: open software development for computational biology and bioinformatics, Genome Biol., 5, R80 (2004)
[14] Goldberg, B. R., Soft tissue sarcoma: an overview, Orthop. Nurs., 26, 4-11 (2007), (00006416-200701000-00003 Quiz 12-3, [pii])
[15] Guo, Z.; Zhang, T.; Li, X.; Wang, Q.; Xu, J.; Yu, H.; Zhu, J.; Wang, H.; Wang, C.; Topol, E. J.; Rao, S., Towards precise classification of cancers based on robust gene functional expression profiles, BMC Bioinform., 6, 58 (2005)
[16] Irizarry, R. A.; Bolstad, B. M.; Collin, F.; Cope, L. M.; Hobbs, B.; Speed, T. P., Summaries of affymetrix GeneChip probe level data, Nucl. Acids Res., 31, e15 (2003)
[17] Konstantinopoulos, P. A.; Fountzilas, E.; Goldsmith, J. D.; Bhasin, M.; Pillay, K.; Francoeur, N.; Libermann, T. A.; Gebhardt, M. C.; Spentzos, D., Analysis of multiple sarcoma expression datasets: implications for classification, oncogenic pathway activation and chemotherapy resistance, PLoS One, 5, e9747 (2010)
[18] Ladanyi, M., Fusions of the SYT and SSX genes in synovial sarcoma, Oncogene, 20, 5755-5762 (2001)
[19] Lahoz, A.; Hall, A., DLC1: a significant GAP in the cancer genome, Genes Dev., 22, 1724-1730 (2008)
[20] Lai, J. P.; Robbins, P. F.; Raffeld, M.; Aung, P. P.; Tsokos, M.; Rosenberg, S. A.; Miettinen, M. M.; Lee, C. C., NY-ESO-1 expression in synovial sarcoma and other mesenchymal tumors: significance for NY-ESO-1-based targeted therapy and differential diagnosis, Mod. Pathol., 25, 854-858 (2012)
[21] Langlois, R. E.; Lu, H., Boosting the prediction and understanding of DNA-binding domains from sequence, Nucl. Acids Res., 38, 3149-3158 (2010)
[22] Lee, E.; Chuang, H. Y.; Kim, J. W.; Ideker, T.; Lee, D., Inferring pathway activity toward precise disease classification, PLoS Comput. Biol., 4, e1000217 (2008)
[23] Li, G. Z.; Bu, H. L.; Yang, M. Q.; Zeng, X. Q.; Yang, J. Y., Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis, BMC Genomics, 9, Suppl 2, S24 (2008)
[24] Nakayama, R.; Nemoto, T.; Takahashi, H.; Ohta, T.; Kawai, A.; Seki, K.; Yoshida, T.; Toyama, Y.; Ichikawa, H.; Hasegawa, T., Gene expression analysis of soft tissue sarcomas: characterization and reclassification of malignant fibrous histiocytoma, Mod. Pathol., 20, 749-759 (2007)
[25] Osuna, D.; de Alava, E., Molecular pathology of sarcomas, Rev Recent Clin Trials, 4, 12-26 (2009)
[26] Pitak, S.; Santitham, P.-O.; Asawin, M.; Jonathan, H. C., Pathway-based microarray analysis for robust disease classification, Neural Comput. Appl. (2011)
[27] Rajapakse, J. C.; Mundra, P. A., Multiclass gene selection using Pareto-fronts, IEEE/ACM Trans. Comput. Biol. Bioinform., 10, 87-97 (2013)
[28] Ross, D. T.; Scherf, U.; Eisen, M. B.; Perou, C. M.; Rees, C.; Spellman, P.; Iyer, V.; Jeffrey, S. S.; Van de Rijn, M.; Waltham, M.; Pergamenschikov, A.; Lee, J. C.; Lashkari, D.; Shalon, D.; Myers, T. G.; Weinstein, J. N.; Botstein, D.; Brown, P. O., Systematic variation in gene expression patterns in human cancer cell lines, Nat. Genet., 24, 227-235 (2000)
[29] Saeys, Y.; Inza, I.; Larranaga, P., A review of feature selection techniques in bioinformatics, Bioinformatics, 23, 2507-2517 (2007)
[30] Skafidas, E.; Testa, R.; Zantomio, D.; Chana, G.; Everall, I. P.; Pantelis, C., Predicting the diagnosis of autism spectrum disorder using gene pathway analysis, Mol. Psychiatry (2012)
[31] Smyth, G. K., Linear models and empirical bayes methods for assessing differential expression in microarray experiments, Stat. Appl. Genet. Mol. Biol., 3 (2004), (Article3) · Zbl 1038.62110
[32] Somorjai, R. L.; Dolenko, B.; Baumgartner, R., Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions, Bioinformatics, 19, 1484-1491 (2003)
[33] Staiger, C.; Cadot, S.; Kooter, R.; Dittrich, M.; Muller, T.; Klau, G. W.; Wessels, L. F., A critical evaluation of network and pathway-based classifiers for outcome prediction in breast cancer, PLoS One, 7, e34796 (2012)
[34] Xiong, M.; Fang, X.; Zhao, J., Biomarker identification by feature wrappers, Genome Res., 11, 1878-1887 (2001)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.