×

zbMATH — the first resource for mathematics

Fully Bayesian logistic regression with hyper-Lasso priors for high-dimensional feature selection. (English) Zbl 07192690
Summary: Feature selection arises in many areas of modern science. For example, in genomic research, we want to find the genes that can be used to separate tissues of different classes (e.g. cancer and normal). One approach is to fit regression/classification models with certain penalization. In the past decade, hyper-LASSO penalization (priors) have received increasing attention in the literature. However, fully Bayesian methods that use Markov chain Monte Carlo (MCMC) for regression/classification with hyper-LASSO priors are still in lack of development. In this paper, we introduce an MCMC method for learning multinomial logistic regression with hyper-LASSO priors. Our MCMC algorithm uses Hamiltonian Monte Carlo in a restricted Gibbs sampling framework. We have used simulation studies and real data to demonstrate the superior performance of hyper-LASSO priors compared to LASSO, and to investigate the issues of choosing heaviness and scale of hyper-LASSO priors.
MSC:
62 Statistics
Software:
BhGLM; horserule
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Dudoit S, Fridlyand J, Speed TP.Comparison of discrimination methods for the classification of tumors using gene expression data. J Amer Statist Assoc. 2002;97:77-87. doi: 10.1198/016214502753479248[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1073.62576
[2] Tibshirani R, Hastie T, Narasimhan B, Chu G.Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci. 2002;99:6567. doi: 10.1073/pnas.082099299[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[3] Ma S, Song X, Huang J.Supervised group Lasso with applications to microarray data analysis. BMC Bioinform. 2007;8:60. doi: 10.1186/1471-2105-8-60[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[4] Clarke R, Ressom HW, Wang A, et al. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev Cancer. 2008;8:37-49. doi: 10.1038/nrc2294[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[5] Tolosi L, Lengauer T.Classification with correlated features: unreliability of feature ranking and solution. Bioinformatics. 2011;27:1986-1994. doi: 10.1093/bioinformatics/btr300[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[6] Tibshirani R.Regression shrinkage and selection via the Lasso. J R Stat Soc. 1996;58:267-288. [Google Scholar] · Zbl 0850.62538
[7] Gelman A, Jakulin A, Pittau MG, et al. A weakly informative default prior distribution for logistic and other regression models. Ann Appl Stat. 2008;2:1360-1383. doi: 10.1214/08-AOAS191[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1156.62017
[8] Yi N, Ma S. Hierarchical shrinkage priors and model fitting for high-dimensional generalized linear models. Statistical applications in genetics and molecular biology, 11, PMID: 23192052 PMCID:PMC3658361; 2012. [Google Scholar]
[9] Fan J, Li R.Variable selection via nonconcave penalized likelihood and its Oracle properties. J Amer Statist Assoc. 2001;96:1348-1360. doi: 10.1198/016214501753382273[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1073.62547
[10] Carvalho CM, Polson NG, Scott JG.Handling sparsity via the horseshoe. Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS); 2009; Clearwater Beach, Florida. p. 73-80[Google Scholar]
[11] Carvalho CM, Polson NG, Scott JG.The horseshoe estimator for sparse signals. Biometrika. 2010;97:465-480 doi: 10.1093/biomet/asq017[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1406.62021
[12] Gelman A.Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 2006;1:515-533. doi: 10.1214/06-BA117A[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1331.62139
[13] Polson NG, Scott JG.On the half-Cauchy prior for a global scale parameter. Bayesian Anal. 2012;7:887-902. doi: 10.1214/12-BA730[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1330.62148
[14] van der Pas SL, Kleijn BJK, van der Vaart AW. The horseshoe estimator: posterior concentration around nearly black vectors, arXiv:1404.0202 [math, stat]; 2014. [Google Scholar] · Zbl 1309.62060
[15] Zhang C.Nearly unbiased variable selection under minimax concave penalty. Ann Statist. 2010;38:894-942. MR: MR2604701 Zbl: 05686523. doi: 10.1214/09-AOS729[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1183.62120
[16] Griffin JE, Brown PJ.Bayesian hyper-Lassos with non-convex penalization. Aust N Z J Stat. 2011;53:423-442. doi: 10.1111/j.1467-842X.2011.00641.x[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1335.62047
[17] Zou H.The adaptive lasso and its oracle properties. J Amer Statist Assoc. 2006;101:1418-1429. doi: 10.1198/016214506000000735[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1171.62326
[18] Bhattacharya A, Pati D, Pillai NS, et al. Bayesian shrinkage, arXiv preprint arXiv:1212.6088; 2012. [Google Scholar]
[19] Kyung M, Gill J, Ghosh M, et al. Penalized regression, standard errors, and Bayesian lassos. Bayesian Anal. 2010;5:369-411. doi: 10.1214/10-BA607[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1330.62289
[20] Polson NG, Scott JG.Shrink globally, act locally: sparse Bayesian regularization and prediction. Bayesian Stat. 2010;9:501-538. [Google Scholar]
[21] Polson NG, Scott JG.Good, great, or lucky? Screening for firms with sustained superior performance using heavy-tailed priors. Ann Appl Stat. 2012;6:161-185. doi: 10.1214/11-AOAS512[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1235.91144
[22] Polson NG, Scott JG.Local shrinkage rules, Levy processes and regularized regression. J R Stat Soc. 2012;74:287-311. doi: 10.1111/j.1467-9868.2011.01015.x[Crossref], [Google Scholar] · Zbl 1411.62209
[23] Breheny P, Huang J.Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann Appl Stat. 2011;5:232-253. PMID: 22081779 PMCID: PMC3212875. doi: 10.1214/10-AOAS388[Crossref], [PubMed], [Web of Science ®], [Google Scholar] · Zbl 1220.62095
[24] Wang Z, Liu H, Zhang T.Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann Stat. 2014;42:2164. doi: 10.1214/14-AOS1238[Crossref], [PubMed], [Web of Science ®], [Google Scholar] · Zbl 1302.62066
[25] Tolosi L, Lengauer T.Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics. 2011;27:1986-1994. doi: 10.1093/bioinformatics/btr300[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[26] Piironen J, Vehtari A. On the hyperprior choice for the global shrinkage parameter in the horseshoe prior, arXiv:1610.05559 [stat], arXiv:1610.05559; 2016. [Google Scholar] · Zbl 06825039
[27] Nalenz M, Villani M. Tree ensembles with rule structured horseshoe regularization, arXiv:1702.05008 [stat], arXiv: 1702.05008; 2017. [Google Scholar] · Zbl 1412.62169
[28] Neal RM. MCMC using Hamiltonian dynamics, In: Brooks S, Gelman A, Jones G, Meng XL, editors. Handbook of Markov chain Monte Carlo. New York (NY): Chapman and Hall/CRC Press; 2010. p. 113-162. [Google Scholar]
[29] Kotz S, Nadarajah S. Multivariate t distributions and their applications. Cambridge: Cambridge Univ Press; 2004. [Crossref], [Google Scholar] · Zbl 1100.62059
[30] Gilks WR, Wild P.Adaptive rejection sampling for Gibbs sampling. Appl Stat. 1992;41:337-348. doi: 10.2307/2347565[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0825.62407
[31] Singh D, Febbo PG, Ross K, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2002;1:203. doi: 10.1016/S1535-6108(02)00030-2[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[32] Dettling M.BagBoosting for tumor classification with gene expression data. Bioinformatics. 2004;20:3583-3593. doi: 10.1093/bioinformatics/bth447[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[33] Dupuis JA, Robert CP.Variable selection in qualitative models via an entropic explanatory power. J Statist Plann Inference. 2003;111:77-94. doi: 10.1016/S0378-3758(02)00286-0[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1033.62066
[34] Goutis C, Robert CP.Model choice in generalised linear models: a Bayesian approach via Kullback-Leibler projections. Biometrika. 1998;85:29-37. doi: 10.1093/biomet/85.1.29[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0903.62061
[35] Piironen J, Vehtari A.Comparison of Bayesian predictive methods for model selection. Stat Comput. 2017;27:711-735. doi: 10.1007/s11222-016-9649-y[Crossref], [Web of Science ®], [Google Scholar] · Zbl 06737693
[36] Li L.Bias-corrected hierarchical Bayesian classification with a selected subset of high-dimensional features. J Amer Statist Assoc. 2012;107:120-134. doi: 10.1198/jasa.2011.ap10446[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1320.62147
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.