Sparse optimal scoring for multiclass cancer diagnosis and biomarker detection using microarray data. (English) Zbl 1158.92316

Summary: Gene expression data sets hold the promise to provide cancer diagnosis on the molecular level. However, using all the gene profiles for diagnosis may be suboptimal. Detection of the molecular signatures not only reduces the number of genes needed for discrimination purposes, but may elucidate the roles they play in the biological processes. Therefore, a central part of diagnosis is to detect a small set of tumor biomarkers which can be used for accurate multiclass cancer classification. This task calls for effective multiclass classifiers with built-in biomarker selection mechanism.
We propose the sparse optimal scoring (SOS) method for multiclass cancer characterization. SOS is a simple prototype classifier based on linear discriminant analysis, in which predictive biomarkers can be automatically determined together with accurate classification. Thus, SOS differentiates itself from many other commonly used classifiers, where gene preselection must be applied before classification. We obtain satisfactory performance while applying SOS to several public data sets.


92C50 Medical applications (general)
92C40 Biochemistry, molecular biology
62P10 Applications of statistics to biology and medical sciences; meta analysis


Full Text: DOI


[1] Alizadeh, A.; Eisen, M.; Davis, R.; Ma, C.; Lossos, I.; Rosenwald, A.; Boldrick, J.; Sabet, H.; Tran, T.; Yu, X.; Powell, J.; Yang, G.; Land, M.; Moore, T.; Hudson, J.; Lu, L.; Lewis, D.; Tibshirani, R.; Sherlock, G.; Chan, W.; Greiner, T.; Weisenburger, D.; Armitage, J.; Warnke, R.; Levy, R.; Wilson, W.; Grever, M.; Byrd, J.; Botstein, D.; Brown, P.; Staudt, L., Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling, Nature, 403, 503-511, (2000)
[2] Breiman, L., Random forests, Machine learning, 45, 5-32, (2001) · Zbl 1007.68152
[3] Davis, C.; Gerick, F.; Hintermair, V.; Friedel, C.; Fundel, K.; Kffner, R.; Zimmer, R., Reliable gene signatures for microarray classification: assessment of stability and performance, Bioinformatics, 22, 2356-2363, (2006)
[4] Dettling, M., Bagboosting for tumor classification with gene expression data, Bioinformatics, 20, 3583-3593, (2004)
[5] Dudoit, S.; Fridlyand, J.; Speed, T., Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of the American statistical association, 97, 77-87, (2002) · Zbl 1073.62576
[6] Golub, T.; Slonim, T.; Tamayo, D.; Huard, P.; Gaasenbeek, C.; Mesirov, M.; Coller, J.; Loh, H.; Downing, M.; Caligiuri, M.J.; Bloomfield, M.; Lander, C., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 531-537, (1999)
[7] Ghosh, D., Penalized discriminant methods for the classification of tumors from microarray experiments, Biometrics, 59, 992-1000, (2003) · Zbl 1274.62773
[8] Ghosh, D.; Chinnaiyan, A., Classification and selection of biomarkers in genomic data using LASSO, Journal of biomedecine and biotechnology, 2, 147-154, (2005)
[9] Hastie, T.; Tibshirani, R.; Buja, A.A., Flexible discriminant analysis by optimal scoring, Journal of the American statistical association, 89, 1255-1270, (1994) · Zbl 0812.62067
[10] Khan, J.; Wei, J.; Ringner, M.; Saal, L.; Ladanyi, M.; Westermann, F.; Berthold, F.; Schwab, M.; Antonescu, C.; Peterson, C.; Meltzer, P., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature medicine, 7, 673-679, (2001)
[11] Koo, J.; Sohn, I.; Kim, S.; Lee, J., Structured polychotomous machine diagnosis of multiple cancer types using gene expression, Bioinformatics, 22, 950-958, (2006)
[12] Lee, Y.; Lee, Y.C., Classification of multiple cancer types by multicategory support vector machines using gene expression data, Bioinformatics, 19, 1132-1139, (2003)
[13] Lee, Y.; Lin, Y.; Wahba, G., Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data, Journal of the American statistical association, 99, 67-81, (2004) · Zbl 1089.62511
[14] Liu, J.; Cutler, G.; Li, W.; Pan, Z.; Peng, S.; Hoey, T.; Chen, L.; Ling, X., Multiclass cancer classification and biomarker discovery using GA-based algorithms, Bioinformatics, 21, 1697-2691, (2005)
[15] Mardia, K.; Kent, J.; Bibby, J., Multivariate analysis, (1979), Academic Press · Zbl 0432.62029
[16] Munagala, K., Tibshirani, R., Brown, P., 2004. Cancer characterization and feature set extraction by discriminative margin clustering. BMC Bioinformatics 5, doi:10.1186/1471-2105-5-21.
[17] Nguyen, D.; Rocke, D., Multi-class cancer classification via partial least squares with gene expression profiles, Bioinformatics, 18, 1216-1226, (2002)
[18] Pomeroy, S.; Tamayo, P.; Gaasenbeek, M.; Sturla, L.; Angelo, M.; McLaughlin, M.; Kim, J.; Goumnerova, L.; Black, P.; Lau, C.; Allen, J.; Zagzag, D.; Olson, J.; Curran, T.; Wetmore, C.; Biegel, J.; Poggio, T.; Mukherjee, S.; Rifkin, R.; Califano, A.; Stolovitzky, G.; Louis, D.; Mesirov, J.; Lander, E.; Golub, T., Prediction of central nervous system embryonal tumour outcome based on gene expression, Nature, 415, 436-442, (2002)
[19] Ramaswamy, S.; Tamayo, P.; Rifkin, R.; Mukherjee, S.; Yeang, C.; Angelo, M.; Ladd, C.; Reich, M.; Latulippe, E.; Mesirov, J.; Poggio, T.; Gerald, W.; Loda, M.; Lander, E.; Golub, T., Multiclass cancer diagnosis using tumor gene expression signatures, Pnas, 98, 15149-15154, (2001)
[20] Schwart, F.; Neve, R.; Eisenman, R.; Gessler, M.; Bruns, G., A WAGR region gene between PAX-6 and FSHB expressed in fetal brain, Human genetics, 94, 658-664, (1994)
[21] Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the royal statistical society series B, 58, 267-299, (1996) · Zbl 0850.62538
[22] Tibshirani, R.; Hastie, T.; Narasimhan, B.; Chu, G., Diagnosis of multiple cancer types by shrunken centroids of gene expression, Pnas, 99, 6567-6572, (2002)
[23] Tibshirani, R.; Hastie, T., Margin trees for high-dimensional classification, Journal of machine learning research, 8, 637-652, (2007) · Zbl 1222.68319
[24] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, Journal of the royal statistical society series B, 68, 49-67, (2006) · Zbl 1141.62030
[25] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, Journal of the royal statistical society series B, 67, 301-320, (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.