Empirical priors for prediction in sparse high-dimensional linear regression. (English) Zbl 07255175

Summary: In this paper we adopt the familiar sparse, high-dimensional linear regression model and focus on the important but often overlooked task of prediction. In particular, we consider a new empirical Bayes framework that incorporates data in the prior in two ways: one is to center the prior for the non-zero regression coefficients and the other is to provide some additional regularization. We show that, in certain settings, the asymptotic concentration of the proposed empirical Bayes posterior predictive distribution is very fast, and we establish a Bernstein-von Mises theorem which ensures that the derived empirical Bayes prediction intervals achieve the targeted frequentist coverage probability. The empirical prior has a convenient conjugate form, so posterior computations are relatively simple and fast. Finally, our numerical results demonstrate the proposed method’s strong finite-sample performance in terms of prediction accuracy, uncertainty quantification, and computation time compared to existing Bayesian methods.


68T05 Learning and adaptive systems in artificial intelligence
Full Text: arXiv Link


[1] Felix Abramovich and Vadim Grinshtein. MAP model selection in Gaussian regression. Electron. J. Stat., 4:932-949, 2010. ISSN 1935-7524.
[2] Ery Arias-Castro and Karim Lounici. Estimation and variable selection with exponential weights.Electron. J. Stat., 8(1):328-354, 2014. ISSN 1935-7524.
[3] E. Belitser and S. Ghosal. Empirical Bayes oracle uncertainty quantification.Ann. Statist., to appear,http://www4.stat.ncsu.edu/ ghoshal/papers/oracle_regression.pdf, 2019.
[4] Anindya Bhadra, Jyotishka Datta, Yunfan Li, Nicholas G. Polson, and Brandon Willard. Prediction risk for the horseshoe regression.J. Mach. Learn. Res., 20:Paper No. 78, 39, 2019. ISSN 1532-4435.
[5] Anirban Bhattacharya, Debdeep Pati, Natesh S. Pillai, and David B. Dunson. DirichletLaplace priors for optimal shrinkage.J. Amer. Statist. Assoc., 110(512):1479-1490, 2015. ISSN 0162-1459.
[6] David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe. Variational inference: a review for statisticians.J. Amer. Statist. Assoc., 112(518):859-877, 2017. ISSN 0162-1459.
[7] Peter B¨uhlmann and Sara van de Geer.Statistics for High-Dimensional Data. Springer Series in Statistics. Springer, Heidelberg, 2011. ISBN 978-3-642-20191-2.
[8] Carlos M. Carvalho, Nicholas G. Polson, and James G. Scott. The horseshoe estimator for sparse signals.Biometrika, 97(2):465-480, 2010. ISSN 0006-3444.
[9] Isma¨el Castillo, Johannes Schmidt-Hieber, and Aad van der Vaart. Bayesian linear regression with sparse priors.Ann. Statist., 43(5):1986-2018, 2015. ISSN 0090-5364.
[10] Jianqing Fan and Runze Li. Variable selection via nonconcave penalized likelihood and its oracle properties.J. Amer. Statist. Assoc., 96(456):1348-1360, 2001. ISSN 0162-1459.
[11] Subhashis Ghosal and Aad van der Vaart.Fundamentals of Nonparametric Bayesian Inference, volume 44 ofCambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2017. ISBN 978-0-521-87826-5.
[12] P. Ghosh and A. Chakrabarti. Posterior concentration properties of a general class of shrinkage estimators around nearly black vectors. Unpublished manuscript,arXiv:1412.8161, 2015.
[13] Peter Gr¨unwald and Thijs van Ommen. Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it.Bayesian Anal., 12(4):1069-1103, 2017. ISSN 1936-0975.
[14] Rajarshi Guhaniyogi and David B. Dunson. Bayesian compressed regression.J. Amer. Statist. Assoc., 110(512):1500-1514, 2015. ISSN 0162-1459.
[15] Trevor Hastie and Brad Efron.lars: Least Angle Regression, Lasso and Forward Stagewise, 2013. URLhttps://CRAN.R-project.org/package=lars. R package version 1.2.
[16] C. C. Holmes and S. G. Walker. Assigning a value to a power likelihood in a general Bayesian model.Biometrika, 104(2):497-503, 2017. ISSN 0006-3444.
[17] L. Hong, T. A. Kuffner, and R. Martin. On overfitting and post-selection uncertainty assessments.Biometrika, 105(1):221-224, 2018.
[18] Wenxin Jiang. Bayesian variable selection for high dimensional generalized linear models: convergence rates of the fitted densities.Ann. Statist., 35(4):1487-1511, 2007. ISSN 0090-5364.
[19] Nicole Kraemer and Juliane Schaefer.parcor: Regularized estimation of partial correlation matrices, 2014. URLhttps://cran.r-project.org/package=parcor. R package version 0.2-6.
[20] Kim-Anh Le Cao, Florian Rohart, Ignacio Gonzalez, and Sebastien Dejean.mixOmics: Omics Data Integration Project, 2016. URLhttps://CRAN.R-project.org/package= mixOmics. R package version 6.1.1.
[21] Kyoungjae Lee, Jaeyong Lee, and Lizhen Lin. Minimax posterior convergence rates and model selection consistency in high-dimensional DAG models based on sparse Cholesky factors.Ann. Statist., 47(6):3413-3437, 2019. ISSN 0090-5364.
[22] Hannes Leeb. The distribution of a linear predictor after model selection: unconditional finite-sample distributions and asymptotic approximations. InOptimality, volume 49 of IMS Lecture Notes Monogr. Ser., pages 291-311. Inst. Math. Statist., Beachwood, OH, 2006.
[23] Hannes Leeb. Conditional predictive inference post model selection.Ann. Statist., 37(5B): 2838-2876, 2009. ISSN 0090-5364.
[24] Chang Liu and Ryan Martin. An empiricalG-Wishart prior for sparse high-dimensional Gaussian graphical models. Unpublished manuscript,arXiv:1912.03807, 2019.
[25] Chang Liu, Ryan Martin, and Weining Shen. Empirical priors and posterior concentration in a piecewise polynomial sequence model. In preparation, 2020a.
[26] Chang Liu, Yue Yang, Howard Bondell, and Ryan Martin. Bayesian inference in highdimensional linear models using an empirical correlation-adaptive prior.Statist. Sinica, to appeararXiv:1810.00739, 2020b.
[27] R. Martin and S. G. Walker. Data-dependent priors and their posterior concentration rates. Electron. J. Stat., 13(2):3049-3081, 2019.
[28] Ryan Martin and Bo Ning. Empirical priors and coverage of posterior credible sets in a sparse normal mean model.Sankhya A, to appear;arXiv:1812.02150, 2020.
[29] Ryan Martin and Stephen G. Walker. Asymptotically minimax empirical Bayes estimation of a sparse normal mean vector.Electron. J. Stat., 8(2):2188-2206, 2014.
[30] Ryan Martin, Raymond Mess, and Stephen G. Walker. Empirical Bayes posterior concentration in sparse high-dimensional linear models.Bernoulli, 23(3):1822-1847, 2017. ISSN 1350-7265.
[31] Gourab Mukherjee and Iain M. Johnstone. Exact minimax estimation of the predictive density in sparse Gaussian models.Ann. Statist., 43(3):937-961, 2015. ISSN 0090-5364.
[32] Benedikt M. P¨otscher and Hannes Leeb. On the distribution of penalized maximum likelihood estimators: the LASSO, SCAD, and thresholding.J. Multivariate Anal., 100(9): 2065-2082, 2009. ISSN 0047-259X.
[33] Koylan Ray and Botond Szab´o. Variational Bayes for high-dimensional linear regression with sparse priors. Unpublished manuscript,arXiv:1904.07150, 2019.
[34] S. Reid, R. Tibshirani, and J. Friedman. A study of error variance estimation in lasso regression. Unpublished manuscript,arXiv:1311.5274, 2014.
[35] N. Syring and R. Martin. Calibrating general posterior credible regions.Biometrika, 106 (2):479-486, 2019.
[36] Yiqi Tang and Ryan Martin.ebreg: An empirical Bayes method for sparse high-dimensional linear regression, 2020. URLhttps://CRAN.R-project.org/package=ebreg. R package version 0.1.2.
[37] Robert Tibshirani. Regression shrinkage and selection via the lasso.J. Roy. Statist. Soc. Ser. B, 58(1):267-288, 1996.
[38] S. van der Pas, J. Scott, A. Chakraborty, and A. Bhattacharya.horseshoe: Implementation of the Horseshoe Prior, 2016. URLhttps://CRAN.R-project.org/package=horseshoe. R package version 0.1.0.
[39] S. van der Pas, B. Szab´o, and A. van der Vaart. Adaptive posterior contraction rates for the horseshoe.Electron. J. Stat., 11(2):3196-3225, 2017a.
[40] S. L. van der Pas, B. J. K. Kleijn, and A. W. van der Vaart. The horseshoe estimator: posterior concentration around nearly black vectors.Electron. J. Stat., 8(2):2585-2618, 2014. ISSN 1935-7524.
[41] St´ephanie van der Pas, Botond Szab´o, and Aad van der Vaart. Uncertainty quantification for the horseshoe (with discussion).Bayesian Anal., 12(4):1221-1274, 2017b. ISSN 19360975. With a rejoinder by the authors.
[42] Nicolas Verzelen.Minimax risks for sparse regressions:ultra-high dimensional phenomenons.Electron. J. Stat., 6:38-90, 2012. ISSN 1935-7524.
[43] Yue Yang and Ryan Martin. Empirical priors and variational approximations of the posterior in high-dimensional linear models. In preparation, 2020.
[44] Arnold Zellner. On assessing prior distributions and Bayesian regression analysis withgprior distributions. InBayesian Inference and Decision Techniques, volume 6 ofStud. Bayesian Econometrics Statist., pages 233-243. North-Holland, Amsterdam, 1986.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.