zbMATH — the first resource for mathematics

SGL-SVM: a novel method for tumor classification via support vector machine with sparse group lasso. (English) Zbl 1429.92057
Summary: At present, with the in-depth study of gene expression data, the significant role of tumor classification in clinical medicine has become more apparent. In particular, the sparse characteristics of gene expression data within and between groups. Therefore, this paper focuses on the study of tumor classification based on the sparsity characteristics of genes. On this basis, we propose a new method of tumor classification – sparse group lasso (least absolute shrinkage and selection operator) and support vector machine (SGL-SVM). Firstly, the primary selection of feature genes is performed on the normalized tumor datasets using the Kruskal-Wallis rank sum test. Secondly, using a sparse group lasso for further selection, and finally, the support vector machine serves as a classifier for classification. We validate proposed method on microarray and NGS datasets respectively. Formerly, on three two-class and five multi-class microarray datasets it is tested by 10-fold cross-validation and compared with other three classifiers. SGL-SVM is then applied on BRCA and GBM datasets and tested by 5-fold cross-validation. Satisfactory accuracy is obtained by above experiments and compared with other proposed methods. The experimental results show that the proposed method achieves a higher classification accuracy and selects fewer feature genes, which can be widely applied in classification for high-dimensional and small-sample tumor datasets. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/SGL-SVM/.
92C32 Pathology, pathophysiology
62P10 Applications of statistics to biology and medical sciences; meta analysis
92C40 Biochemistry, molecular biology
Full Text: DOI
[1] Akbani, R.; Ng, K. S.; Werner, H. M.; Zhang, F.; Ju, Z.; Liu, W.; Yang, J-Y.; Lu, Y.; Weinstein, J. N.; Mills, G. B., A pan-cancer proteomic analysis of The Cancer Genome Atlas (TCGA) project, Cancer Res., 74, 19, 4262 (2014)
[2] Algamal, Z. Y.; Lee, M. H., Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification, Expert Syst. Appl., 42, 9326-9332 (2015)
[3] Alizadeh, A. A.; Eisen, M. B.; Davis, R. E.; Lossos, I. S.; Rosenwald, A.; Boldrick, J. C.; Staudt, L. M.; Sabet, H.; Tran, T.; Yu, X.; Powell, J.; Yang, L.; Marti, G. E.; Moore, T.; Hudson, J.; Lu, L.; Lewis, D. B.; Tibshirani, R.; Sherlock, G.; Chan, W. C.; Greiner, T. C.; Weisenburger, D. D.; Armitage, J. O.; Warnke, R. A.; Levy, R.; Wilson, W. H.; Grever, M. R.; Byrd, J. C.; Botstein, D.; Brown, P. O., Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature, 403, 503-511 (2000)
[4] Armstrong, S. A.; Staunton, J. E.; Silverman, L. B.; Pieters, R.; den Boer, M. L.; Minden, M. D.; Scott, A. A.; Jane, E. S.; Lewis, B. S.; Rob, P.; Monique, L. B.; Mark, D. M.; Stephen, E. S.; Eric, S. L.; Todd, R. G.; Stanley, J., Korsmeyer MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia, Nat. Genet., 30, 41-47 (2002)
[5] Behrmann, J.; Etmann, C.; Boskamp, T.; Casadonte, R.; Kriegsmann, J.; Maass, P., Deep learning for tumor classification in imaging mass spectrometry, Bioinformatics, 34, 1215-1223 (2017)
[6] Bharat, S.; Vyas, O. P., A meta-heuristic regression-based feature selection for predictive analytics, Data. Sci. J., 13, 106-118 (2014)
[7] Bhattacharjee, A.; Richards, W. G.; Staunton, J.; Li, C.; Monti, S.; Vasa, P.; Ladd, C.; Beheshti, J.; Bueno, R.; Gillette, M. A.; Loda, M.; Weber, G. M.; Mark, E. J.; Lander, E. S.; Wong, W. H.; Johnson, B. E.; Golub, T. R.; Sugarbaker, D. J.; Meyerson, M., Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses, Proc. Natl. Acad. Sci. USA, 98, 13790-13795 (2001)
[8] Bolon-Canedo, V.; Sanchez-Marono, N.; Alonso-Betanzos, A.; Benitez, J. M.; Herrera, F., A review of microarray datasets and applied feature selection methods, Inf. Sci., 282, 111-135 (2014)
[9] Borgi, M. A.; Labate, D.; Arbi, M. E.; Amar, C. B., Sparse multi-stage regularized feature learning for robust face recognition, Expert Syst. Appl., 42, 269-279 (2015)
[10] Breiman, L., Better Subset Regression Using the Nonnegative Garrote, Technometrics, 37, 373-384 (1995) · Zbl 0862.62059
[11] Castillo, D.; Galvez, J. M.; Herrera, L. J.; Rojas, I., Breast Cancer Microarray and RNASeq Data Integration Applied to Classification, (International Work-Conference on Artificial Neural Networks (2017), Springer: Springer Cham), 123-131
[12] Chandra, B.; Gupta, M., An efficient statistical feature selection approach for classification of gene expression data, J. Biomed. Inform., 44, 529-535 (2011)
[13] Chan, H. P.; Kim, S. B., Sequential random k-nearest neighbor feature selection for high-dimensional data, Expert Syst. Appl., 42, 2336-2342 (2015)
[14] Cohen, J., A coefficient of agreement for nominal scales, Educ. Psychol. Meas., 20, 37-46 (1960)
[15] Dagliyan, O.; Uneyyuksektepe, F.; Kavakli, I. H.; Turkay, M., Optimization based tumor classification from microarray gene expression data, PLos One, 6, e14579 (2011)
[16] Dettling, M., BagBoosting for tumor classification with gene expression data, Bioinformatics, 20, 3583-3593 (2004)
[17] Efron, B.; Hastie, T.; Johnstone, I. M.; Tibshirani, R., Least angle regression, Ann. Stat., 32, 407-499 (2004) · Zbl 1091.62054
[18] Elingaramil, S.; Li, X.; He, N., Applications of nanotechnology, next generation sequencing and microarrays in biomedical research, J. Nanosci. Nanotechno., 13, 7, 4539-4551 (2013)
[19] Fan, J.; Li, R., Variable selection via nonconvave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., 96, 1348-1360 (2001) · Zbl 1073.62547
[20] Foygel, R.; Drton, M., Exact block-wise optimization in group lasso and sparse group lasso for linear regression, arXiv., 1010, 3320 (2010)
[21] Frank, L. E.; Friedman, J. H., A statistical view of some chemometrics regression tools, Technometrics, 35, 109-135 (1993) · Zbl 0775.62288
[22] Fu, W. J., Penalized regressions: the bridge versus the lasso, J. Comput. Graph. Statist., 7, 397-416 (1998)
[23] Gao, J.; Kwan, P. W.; Shi, D., Sparse kernel learning with lasso and bayesian inference algorithm, Neural Networks, 23, 257-264 (2010) · Zbl 1396.68090
[24] Glaab, E.; Bacardit, J.; Garibaldi, J. M.; Krasnogor, N., Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data, PLos One, 7, e39932 (2012)
[25] Golub, T. R.; Slonim, D. K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J. P.; Coller, H. A.; Loh, M. L.; Downing, J. R.; Caligiuri, M. A.; Bloomfield, C. D.; Lander, E. S., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 531-537 (1999)
[26] Guo, Y.; Liu, S.; Li, Z.; Shang, X., Bcdforest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data, BMC Bioinformatics, 19, 118 (2018)
[27] Han, F.; Yang, C.; Wu, Y. Q.; Zhu, J. S.; Ling, Q. H.; Song, Y. Q.; Huang, D. S., A gene selection method for microarray data based on binary PSO encoding gene-to-class sensitivity information, IEEE/ACM. Trans. Comput. Biol. Bioinform., 14, 85-96 (2017)
[28] Hewett, R.; Kijsanayothin, P., Tumor classification ranking from microarray data, BMC Genomics, 9, S21 (2008)
[29] Jain, I.; Jain, V. K.; Jain, R., Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification, Appl. Soft. Comput., 62, 203-215 (2018)
[30] Kang, C.; Huo, Y.; Xin, L.; Tian, B.; Yu, B., Feature selection and tumor classification for microarray data using relaxed lasso and generalized multi-class support vector machine, J. Theor. Biol., 463, 77-91 (2019) · Zbl 1406.92192
[31] Kang, S.; Song, J., Robust gene selection methods using weighting schemes for microarray data analysis, BMC Bioinformatics, 18, 389 (2017)
[32] Kar, S.; Sharma, K. D.; Maitra, M., Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique, Expert Syst. Appl., 42, 612-627 (2015)
[33] Khan, J.; Wei, J. S.; Ringner, M.; Saal, L. H.; Ladanyi, M.; Westermann, F.; Berthold, F.; Schwab, M.; Antonescu, C. R.; Peterson, C.; Meltzer, P. S., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nat. Med., 7, 673-679 (2001)
[34] Kolali, K. M.; Bazrafkan, M., A novel sparse coding algorithm for classification of tumors based on gene expression data, Med. Biol. Eng. Comput., 54, 869-876 (2016)
[35] Kruskal, W.; Wallis, W. A., Use of ranks in one-criterion variance analysis, J. Am. Stat. Assoc., 47, 583-621 (1952) · Zbl 0048.11703
[36] Latkowski, T.; Osowski, S., Data mining for feature selection in gene expression autism data, Expert Syst. Appl., 42, 864-872 (2015)
[37] Li, W.; Bo, L.; Wen, Z.; Min, C.; Li, P.; Wei, X.; Gu, C.; Li, K., Maxdenominator reweighted sparse representation for tumor classification, Sci. Rep., 7, 46030 (2017)
[38] Liu, Z.; Tang, D.; Cai, Y.; Wang, R.; Chen, F., A hybrid method based on ensemble WELM for handling multi class imbalance in cancer microarray data, Neurocomputing, 266, 641-650 (2017)
[39] Liu, J., Ye, J., 2010. Fast overlapping group lasso. arXiv. 1009.0306.
[40] Lu, H.; Chen, J.; Yan, K.; Jin, Q.; Xue, Y.; Gao, Z., A hybrid feature selection algorithm for gene expression data classification, Neurocomputing, 256, 56-62 (2017)
[41] Luo, X.; Liu, F.; Yang, S.; Wang, X.; Zhou, Z., Joint sparse regularization based sparse semi-supervised extreme learning machine (S3ELM) for classification, Knowl-Based. Syst., 73, 149-160 (2015)
[42] Lv, J.; Peng, Q.; Chen, X.; Sun, Z., A multi-objective heuristic algorithm for gene expression microarray data classification, Expert Syst. Appl., 59, 13-19 (2016)
[43] Ma, S.; Song, X.; Huang, J., Supervised group Lasso with applications to microarray data analysis, BMC Bioinform., 8, 60 (2007)
[44] Margalit, O.; Somech, R.; Amariglio, N.; Rechavi, G., Microarray-based gene expression profiling of hematologic malignancies: basic concepts and clinical applications, Blood Rev., 19, 223-234 (2005)
[45] Mramor, M.; Leban, G.; Demsar, J.; Zupan, B., Visualization-based cancer microarray data classification analysis, Bioinformatics, 23, 2147-2154 (2007)
[46] Negahban, S. N.; Ravikumar, P.; Wainwright, M. J.; Yu, B., A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers, Stat. Sci., 27, 538-557 (2009) · Zbl 1331.62350
[47] Northcott, P. A.; Buchhalter, I.; Morrissy, A. S.; Hovestadt, V.; Weischenfeldt, J.; Ehrenberger, T.; Warnatz, H. J.; Grobner, S.; Segurawang, M.; Zichner, T.; Rudneva, V. A.; Warnatz, H.; Sidiropoulos, N.; Phillips, A. H.; Schumacher, S. E.; Kleinheinz, K.; Waszak, S. M.; Erkek, S.; Jones, D. T.; Worst, M.; Kool, B. C.; Zapatka, M.; Jager, N.; Chavez, L.; Hutter, B.; Bieg, M.; Paramasivam, N.; Heinold, M.; Gu, Z.; Ishaque, N.; Jagerschmidt, C.; Imbusch, C. D.; Jugold, A.; Hubschmann, D.; Risch, T.; Amstislavskiy, V.; Gonzalez, F. G.R.; Weber, U.; Wolf, S.; Robinson, G. W.; Zhou, X.; Wu, G.; Finkelstein, D.; Liu, Y.; Cavalli, F. M.G.; Luu, B.; Ramaswamy, V.; Wu, X.; Koster, J.; Ryzhova, M.; Cho, Y.; Pomeroy, S. L.; Heroldmende, C.; Schuhmann, M. U.; Ebinger, M.; Liau, L. M.; Mora, J.; Mclendon, R. E.; Jabado, N.; Kumabe, T.; Chuah, E.; Ma, Y.; Moore, R. A.; Mungall, A. J.; Mungall, K.; Thiessen, N.; Tse, K.; Wong, T.; Jones, S. J.M.; Witt, O.; Milde, T.; Deimling, A. V.; Capper, D.; Korshunov, A.; Yaspo, M.; Kriwacki, R. W.; Gajjar, A.; Zhang, J.; Beroukhim, R.; Fraenkel, E.; Korbel, J. O.; Brors, B.; Schlesner, M.; Eils, R.; Marra, M. A.; Pfister, S. M.; Taylor, M. D., Peter Lichter The whole-genome landscape of medulloblastoma subtypes, Nature, 547, 311-317 (2017)
[48] Osborne, M. R.; Presnell, B.; Turlach, B. A., A new approach to variable selection in least squares problems, Ima. J. Numer. Anal., 20, 389-403 (2000) · Zbl 0962.65036
[49] Piao, Y.; Piao, M.; Park, K.; Ryu, K. H., An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data, Bioinformatics, 28, 3306-3315 (2012)
[50] Salem, H.; Attiya, G.; El-Fishawy, N., Classification of human cancer diseases by gene expression profiles, Appl. Soft. Comput., 50, 124-134 (2017)
[51] Shipp, M. A.; Ross, K. N.; Tamayo, P.; Weng, A. P.; Kutok, J. L.; Aguiar, R. C.T.; Gaasenbeek, M.; Angelo, M.; Reich, M. R.; Pinkus, G. S.; Ray, T. S.; Koval, M.; Norton, A. J.; Lister, A. J.; Mesirov, J. P.; Neuberg, D.; Lander, E. S.; Aster, J. C.; Golub, T. R., Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning, Nat. Med., 8, 68-74 (2002)
[52] Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R., A sparse-group lasso, J. Comput. Graph. Stat., 22, 231-245 (2013)
[53] Singh, D.; Febbo, P. G.; Ross, K.; Jackson, D. G.; Manola, J.; Ladd, C.; Tamayo, P.; Renshaw, A. A.; Damico, A. V.; Richie, J. P.; Lander, E. S.; Loda, M.; Kantoff, P. W.; Golub, T. R.; Sellers, W. R., Gene expression correlates of clinical prostate cancer behavior, Cancer Cell, 1, 203-209 (2002)
[54] Sing, T.; Sander, O.; Beerenwinkel, N.; Lengauer, T., Lengauer, ROCR: visualizing classifier performance in R, Bioinformatics, 21, 3940-3941 (2005)
[55] Sun, S. Q.; Peng, Q. K.; Shakoor, A., A kernel-based multivariate feature selection method for microarray data classification, PLos One, 9, Article e102541 pp. (2014)
[56] Tibshirani, R. J., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B., 58, 267-288 (1996) · Zbl 0850.62538
[57] Tibshirani, R.; Saunders, M. A.; Rosset, S.; Zhu, J.; Knight, K., Sparsity and smoothness via the fused lasso, J. R. Stat. Soc. Series. B. Stat. Methodol., 67, 91-108 (2005) · Zbl 1060.62049
[58] Vapnik, V.; Cortes, C., Support-vector networks, Mach. Learn., 20, 273-297 (1995) · Zbl 0831.68098
[59] Wang, A.; An, N.; Chen, G.; Liu, L.; Alterovitz, G., Subtype dependent biomarker identification and tumor classification from gene expression profiles, Knowl-Based. Syst., 146, 104-117 (2018)
[60] Wang, H.; Leng, C., A note on adaptive group lasso, Comput. Stat. Data. Anal., 52, 5277-5286 (2008) · Zbl 05565099
[61] Wang, M.; Song, L.; Wang, X., Quadratic approximation via the SCAD penalty with a diverging number of parameters, Commun. Stat-Simul. C., 45, 1-16 (2016) · Zbl 1341.62071
[62] Wang, M.; Wang, X., Adaptive lasso estimators for ultrahigh dimensional generalized linear models, Stat. Probabil. Lett., 89, 41-50 (2014) · Zbl 06303320
[63] Wang, X.; Wang, M., Variable selection for high-dimensional generalized linear models with the weighted elastic-net procedure, J. Appl. Stat., 43, 796-809 (2016)
[64] Wang, X.; Wang, M., Adaptive group bridge estimation for high-dimensional partially linear models, J. Inequal. Appl., 2017, 158 (2017) · Zbl 1366.62146
[65] Weinstein, J. N.; Collisson, E. A.; Mills, G. B.; Shaw, K. R.M.; Ozenberger, B. A.; Ellrott, K.; Shmulevich, I.; Sander, C.; Stuart, J. M., The cancer genome atlas pan-cancer analysis project, Nat. Genet., 45, 10, 1113 (2013)
[66] Xu, X.; Ghosh, M., Bayesian variable selection and estimation for group lasso, Bayesian Anal., 10, 909-936 (2015) · Zbl 1334.62132
[67] Xu, J.; Wu, P.; Chen, Y.; Meng, Q.; Dawood, H.; Dawood, H., A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data, BMC Bioinform., 20, 527 (2019)
[68] Yang, A.; Cao, T.; Li, R.; Liao, B., A hybrid gene selection method for cancer classification based on clustering algorithm and euclidean distance, J. Comput. Theor. Nanosci., 9, 611-615 (2012)
[69] Yeoh, E. J.; Ross, M. E.; Shurtleff, S. A.; Williams, W. K.; Patel, D.; Mahfouz, R.; Behm, F. G.; Raimondi, S. C.; Relling, M. V.; Patel, A.; Cheng, C.; Campana, D.; Wilkins, D.; Zhou, X.; Li, J.; Liu, H.; Pui, C. H.; Evans, W. E.; Naeve, C.; Wong, L.; Downing, J. R., Classification subtype discovery and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling, Cancer Cell, 1, 133-143 (2002)
[70] Yu, B.; Zhang, Y., The analysis of colon cancer gene expression profiles and the extraction of informative genes, J. Comput. Theor. Nanosci., 10, 1097-1103 (2013)
[71] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Series. B. Stat. Methodol., 68, 49-67 (2006) · Zbl 1141.62030
[72] Zainuddin, Z.; Ong, P., Reliable multiclass cancer classification of microarray gene expression profiles using an improved wavelet neural network, Expert Syst. Appl., 38, 13711-13722 (2011)
[73] Zhao, G.; Wu, Y., Feature subset selection for cancer classification using weight local modularity, Sci. Rep., 6, 34759 (2016)
[74] Zou, H., The adaptive lasso and its oracle properties, J. Am. Stat. Assoc., 101, 1418-1429 (2006) · Zbl 1171.62326
[75] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, J. R. Stat. Soc. Series. B. Stat. Methodol., 67, 301-320 (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.