×

A general framework for Bayes structured linear models. (English) Zbl 1471.62241

The paper under review provides a unified methodology and theory for both Bayes high dimensional statistics and Bayes nonparametric statistics in a general framework of structured linear models. The authors first introduce a unified view of various high dimensional and nonparametric models, and then propose a single prior distribution for all models in the considered framework. Optimal rates of convergence of the posterior distributions are established under appropriate conditions. The results directly lead to exact minimax posterior contraction rates in stochastic block model, biclustering, sparse linear regression, regression with group sparsity, multitask learning and dictionary learning. Moreover, a general posterior oracle inequality, which allows arbitrary model misspecification, is also derived. The main results are illustrated by examples ranging from nonparametric estimation to high dimensional statistics.

MSC:

62C10 Bayesian problems; characterization of Bayes procedures
62F15 Bayesian inference
62G05 Nonparametric estimation
62J05 Linear regression; mixed models
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Agarwal, A., Anandkumar, A. and Netrapalli, P. (2013). Exact recovery of sparsely used overcomplete dictionaries. ArXiv preprint. Available at arXiv:1309.1952. arXiv: 1309.1952
Zentralblatt MATH: 1359.62229
Digital Object Identifier: doi:10.1109/TIT.2016.2614684
· Zbl 1359.62229 · doi:10.1109/TIT.2016.2614684
[2] Aldous, D. J. (1981). Representations for partially exchangeable arrays of random variables. J. Multivariate Anal. 11 581-598. Zentralblatt MATH: 0474.60044
Digital Object Identifier: doi:10.1016/0047-259X(81)90099-3
· Zbl 0474.60044 · doi:10.1016/0047-259X(81)90099-3
[3] Bakin, S. (1999). Adaptive regression and model selection in data mining problems.
[4] Banerjee, S. and Ghosal, S. (2014). Posterior convergence rates for estimating large precision matrices using graphical models. Electron. J. Stat. 8 2111-2137. Zentralblatt MATH: 1302.62124
Digital Object Identifier: doi:10.1214/14-EJS945
· Zbl 1302.62124 · doi:10.1214/14-EJS945
[5] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301-413. Zentralblatt MATH: 0946.62036
Digital Object Identifier: doi:10.1007/s004400050210
· Zbl 0946.62036 · doi:10.1007/s004400050210
[6] Barron, A., Schervish, M. J. and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. Ann. Statist. 27 536-561. Zentralblatt MATH: 0980.62039
Digital Object Identifier: doi:10.1214/aos/1017939142
Project Euclid: euclid.aos/1018031206
· Zbl 0980.62039 · doi:10.1214/aos/1017939142
[7] Barron, A. R. (1988). The exponential convergence of posterior probabilities with implications for Bayes estimators of density functions. Univ. of Illinois.
[8] Barron, A. R. and Cover, T. M. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 1034-1054. Zentralblatt MATH: 0743.62003
Digital Object Identifier: doi:10.1109/18.86996
· Zbl 0743.62003 · doi:10.1109/18.86996
[9] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. Zentralblatt MATH: 1173.62022
Digital Object Identifier: doi:10.1214/08-AOS620
Project Euclid: euclid.aos/1245332830
· Zbl 1173.62022 · doi:10.1214/08-AOS620
[10] Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3 203-268. · Zbl 1037.62001
[11] Brown, L. D. and Low, M. G. (1996). Asymptotic equivalence of nonparametric regression and white noise. Ann. Statist. 24 2384-2398. Zentralblatt MATH: 0867.62022
Digital Object Identifier: doi:10.1214/aos/1032181159
Project Euclid: euclid.aos/1032181159
· Zbl 0867.62022 · doi:10.1214/aos/1032181159
[12] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Springer, Heidelberg. · Zbl 1273.62015
[13] Bunea, F. (2008). Consistent selection via the Lasso for high dimensional approximating regression models. In Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh. Inst. Math. Stat. (IMS) Collect. 3 122-137. IMS, Beachwood, OH. · Zbl 1159.62004
[14] Candes, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203-4215. Zentralblatt MATH: 1264.94121
Digital Object Identifier: doi:10.1109/TIT.2005.858979
· Zbl 1264.94121 · doi:10.1109/TIT.2005.858979
[15] Castillo, I. (2014). On Bayesian supremum norm contraction rates. Ann. Statist. 42 2058-2091. Zentralblatt MATH: 1305.62189
Digital Object Identifier: doi:10.1214/14-AOS1253
Project Euclid: euclid.aos/1410440634
· Zbl 1305.62189 · doi:10.1214/14-AOS1253
[16] Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. (2015). Bayesian linear regression with sparse priors. Ann. Statist. 43 1986-2018. Zentralblatt MATH: 06502640
Digital Object Identifier: doi:10.1214/15-AOS1334
Project Euclid: euclid.aos/1438606851
· Zbl 1486.62197 · doi:10.1214/15-AOS1334
[17] Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Statist. 40 2069-2101. Zentralblatt MATH: 1257.62025
Digital Object Identifier: doi:10.1214/12-AOS1029
Project Euclid: euclid.aos/1351602537
· Zbl 1257.62025 · doi:10.1214/12-AOS1029
[18] Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Lecture Notes in Math. 1851. Springer, Berlin. Zentralblatt MATH: 1076.93002
· Zbl 1076.93002
[19] Diaconis, P. and Janson, S. (2008). Graph limits and exchangeable random graphs. Rend. Mat. Appl. (7) 28 33-61. Zentralblatt MATH: 1162.60009
· Zbl 1162.60009
[20] Donoho, D. L., Elad, M. and Temlyakov, V. N. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6-18. Zentralblatt MATH: 1288.94017
Digital Object Identifier: doi:10.1109/TIT.2005.860430
· Zbl 1288.94017 · doi:10.1109/TIT.2005.860430
[21] Fang, K. T., Kotz, S. and Ng, K. W. (1990). Symmetric Multivariate and Related Distributions. Monographs on Statistics and Applied Probability 36. CRC Press, London. Zentralblatt MATH: 0699.62048
· Zbl 0699.62048
[22] Gao, C., Lu, Y. and Zhou, H. H. (2015). Rate-optimal graphon estimation. Ann. Statist. 43 2624-2652. Zentralblatt MATH: 1332.60050
Digital Object Identifier: doi:10.1214/15-AOS1354
Project Euclid: euclid.aos/1444222087
· Zbl 1332.60050 · doi:10.1214/15-AOS1354
[23] Gao, C., van der Vaart, A. W and Zhou, H. H (2020). Supplement to “A General Framework for Bayes Structured Linear Models.” https://doi.org/10.1214/19-AOS1909SUPP.
[24] Gao, C. and Zhou, H. H. (2015). Rate-optimal posterior contraction for sparse PCA. Ann. Statist. 43 785-818. Zentralblatt MATH: 1312.62078
Digital Object Identifier: doi:10.1214/14-AOS1268
Project Euclid: euclid.aos/1427115287
· Zbl 1312.62078 · doi:10.1214/14-AOS1268
[25] Gao, C. and Zhou, H. H. (2016). Rate exact Bayesian adaptation with modified block priors. Ann. Statist. 44 318-345. Zentralblatt MATH: 1331.62215
Digital Object Identifier: doi:10.1214/15-AOS1368
Project Euclid: euclid.aos/1449755965
· Zbl 1331.62215 · doi:10.1214/15-AOS1368
[26] Ghosal, S., Ghosh, J. K. and Ramamoorthi, R. V. (1999). Posterior consistency of Dirichlet mixtures in density estimation. Ann. Statist. 27 143-158. Zentralblatt MATH: 0932.62043
Digital Object Identifier: doi:10.1214/aos/1018031105
Project Euclid: euclid.aos/1018031105
· Zbl 0932.62043 · doi:10.1214/aos/1018031105
[27] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500-531. Zentralblatt MATH: 1105.62315
Digital Object Identifier: doi:10.1214/aos/1016218228
Project Euclid: euclid.aos/1016218228
· Zbl 1105.62315 · doi:10.1214/aos/1016218228
[28] Hartigan, J. A. (1972). Direct clustering of a data matrix. J. Amer. Statist. Assoc. 67 123-129.
[29] Hoffmann, M., Rousseau, J. and Schmidt-Hieber, J. (2015). On adaptive posterior concentration rates. Ann. Statist. 43 2259-2295. Zentralblatt MATH: 1327.62306
Digital Object Identifier: doi:10.1214/15-AOS1341
Project Euclid: euclid.aos/1442364152
· Zbl 1327.62306 · doi:10.1214/15-AOS1341
[30] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw. 5 109-137.
[31] Hoover, D. N. (1979). Relations on probability spaces and arrays of random variables 2. Institute for Advanced Study, Princeton, NJ. Preprint.
[32] Johnstone, I. M. (2017). Gaussian estimation: Sequence and wavelet models.
[33] Kallenberg, O. (1989). On the representation theorem for exchangeable arrays. J. Multivariate Anal. 30 137-154. Zentralblatt MATH: 0676.60046
Digital Object Identifier: doi:10.1016/0047-259X(89)90092-4
· Zbl 0676.60046 · doi:10.1016/0047-259X(89)90092-4
[34] Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinite-dimensional Bayesian statistics. Ann. Statist. 34 837-877. Zentralblatt MATH: 1095.62031
Digital Object Identifier: doi:10.1214/009053606000000029
Project Euclid: euclid.aos/1151418243
· Zbl 1095.62031 · doi:10.1214/009053606000000029
[35] Klopp, O., Lu, Y., Tsybakov, A. B. and Zhou, H. H. (2019). Structured matrix estimation and completion. Bernoulli 25 3883-3911. Zentralblatt MATH: 1428.62281
Digital Object Identifier: doi:10.3150/19-BEJ1114
Project Euclid: euclid.bj/1569398788
· Zbl 1428.62281 · doi:10.3150/19-BEJ1114
[36] Leung, G. and Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396-3410. Zentralblatt MATH: 1309.94051
Digital Object Identifier: doi:10.1109/TIT.2006.878172
· Zbl 1309.94051 · doi:10.1109/TIT.2006.878172
[37] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 90-102. Zentralblatt MATH: 1306.62155
Digital Object Identifier: doi:10.1214/08-EJS177
· Zbl 1306.62155 · doi:10.1214/08-EJS177
[38] Lounici, K., Pontil, M., van de Geer, S. and Tsybakov, A. B. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164-2204. Zentralblatt MATH: 1306.62156
Digital Object Identifier: doi:10.1214/11-AOS896
Project Euclid: euclid.aos/1319595462
· Zbl 1306.62156 · doi:10.1214/11-AOS896
[39] Lovász, L. (2012). Large Networks and Graph Limits. American Mathematical Society Colloquium Publications 60. Amer. Math. Soc., Providence, RI. · Zbl 1292.05001
[40] Lovász, L. and Szegedy, B. (2006). Limits of dense graph sequences. J. Combin. Theory Ser. B 96 933-957. Zentralblatt MATH: 1113.05092
Digital Object Identifier: doi:10.1016/j.jctb.2006.05.002
· Zbl 1113.05092 · doi:10.1016/j.jctb.2006.05.002
[41] Ma, Z. and Wu, Y. (2015). Volume ratio, sparsity, and minimaxity under unitarily invariant norms. IEEE Trans. Inform. Theory 61 6939-6956. Zentralblatt MATH: 1359.94135
Digital Object Identifier: doi:10.1109/TIT.2015.2487541
· Zbl 1359.94135 · doi:10.1109/TIT.2015.2487541
[42] Martin, R., Mess, R. and Walker, S. G. (2017). Empirical Bayes posterior concentration in sparse high-dimensional linear models. Bernoulli 23 1822-1847. Zentralblatt MATH: 06714320
Digital Object Identifier: doi:10.3150/15-BEJ797
Project Euclid: euclid.bj/1489737626
· Zbl 1450.62085 · doi:10.3150/15-BEJ797
[43] Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Math. 1738 85-277. Springer, Berlin. Zentralblatt MATH: 0998.62033
· Zbl 0998.62033
[44] Nussbaum, M. (1996). Asymptotic equivalence of density estimation and Gaussian white noise. Ann. Statist. 24 2399-2430. Zentralblatt MATH: 0867.62035
Digital Object Identifier: doi:10.1214/aos/1032181160
Project Euclid: euclid.aos/1032181160
· Zbl 0867.62035 · doi:10.1214/aos/1032181160
[45] Olshausen, B. A. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381 607-609.
[46] Pati, D. and Bhattacharya, A. (2015). Optimal Bayesian estimation in stochastic block models. ArXiv preprint. Available at arXiv:1505.06794. arXiv: 1505.06794
Zentralblatt MATH: 1328.62178
Digital Object Identifier: doi:10.1016/j.spl.2015.04.012
· Zbl 1328.62178 · doi:10.1016/j.spl.2015.04.012
[47] Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over \(\ell_q\)-balls. IEEE Trans. Inform. Theory 57 6976-6994. Zentralblatt MATH: 1365.62276
Digital Object Identifier: doi:10.1109/TIT.2011.2165799
· Zbl 1365.62276 · doi:10.1109/TIT.2011.2165799
[48] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731-771. Zentralblatt MATH: 1215.62043
Digital Object Identifier: doi:10.1214/10-AOS854
Project Euclid: euclid.aos/1299680953
· Zbl 1215.62043 · doi:10.1214/10-AOS854
[49] Rigollet, P. and Tsybakov, A. B. (2012). Sparse estimation by exponential weighting. Statist. Sci. 27 558-575. Zentralblatt MATH: 1331.62351
Digital Object Identifier: doi:10.1214/12-STS393
Project Euclid: euclid.ss/1356098556
· Zbl 1331.62351 · doi:10.1214/12-STS393
[50] Rivoirard, V. and Rousseau, J. (2012). Posterior concentration rates for infinite dimensional exponential families. Bayesian Anal. 7 311-333. Zentralblatt MATH: 1330.62179
Digital Object Identifier: doi:10.1214/12-BA710
· Zbl 1330.62179 · doi:10.1214/12-BA710
[51] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist. 29 687-714. Zentralblatt MATH: 1041.62022
Digital Object Identifier: doi:10.1214/aos/1009210686
Project Euclid: euclid.aos/1009210686
· Zbl 1041.62022 · doi:10.1214/aos/1009210686
[52] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. Zentralblatt MATH: 0850.62538
Digital Object Identifier: doi:10.1111/j.2517-6161.1996.tb02080.x
· Zbl 0850.62538 · doi:10.1111/j.2517-6161.1996.tb02080.x
[53] Tsybakov, A. B. (2003). Optimal rates of aggregation. In Learning Theory and Kernel Machines 303-313. Springer, Berlin. Zentralblatt MATH: 1208.62073
· Zbl 1208.62073
[54] Tsybakov, A. B. (2014). Aggregation and minimax optimality in high-dimensional estimation. In Proceedings of the International Congress of Mathematicians—Seoul 2014. Vol. IV 225-246. Kyung Moon Sa, Seoul. Zentralblatt MATH: 1380.62136
· Zbl 1380.62136
[55] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360-1392. Zentralblatt MATH: 1327.62425
Digital Object Identifier: doi:10.1214/09-EJS506
· Zbl 1327.62425 · doi:10.1214/09-EJS506
[56] van der Pas, S. L. and van der Vaart, A. W. (2018). Bayesian community detection. Bayesian Anal. 13 767-796. Zentralblatt MATH: 1407.62240
Digital Object Identifier: doi:10.1214/17-BA1078
· Zbl 1407.62240 · doi:10.1214/17-BA1078
[57] van der Vaart, A. W. and van Zanten, J. H. (2008). Rates of contraction of posterior distributions based on Gaussian process priors. Ann. Statist. 36 1435-1463. Zentralblatt MATH: 1141.60018
Digital Object Identifier: doi:10.1214/009053607000000613
Project Euclid: euclid.aos/1211819570
· Zbl 1141.60018 · doi:10.1214/009053607000000613
[58] Wang, Z., Paterlini, S., Gao, F. and Yang, Y. (2014). Adaptive minimax regression estimation over sparse \(\ell_q\)-hulls. J. Mach. Learn. Res. 15 1675-1711. Zentralblatt MATH: 1319.62016
· Zbl 1319.62016
[59] Yang, Y. (1999). Model selection for nonparametric regression. Statist. Sinica 9 475-499. Zentralblatt MATH: 0921.62051
· Zbl 0921.62051
[60] Yang, Y. (2000). Combining different procedures for adaptive regression. J. Multivariate Anal. 74 135-161. Zentralblatt MATH: 0964.62032
Digital Object Identifier: doi:10.1006/jmva.1999.1884
· Zbl 0964.62032 · doi:10.1006/jmva.1999.1884
[61] Yang, Y. (2004). Aggregating regression procedures to improve performance. Bernoulli 10 25-47. Zentralblatt MATH: 1040.62030
Digital Object Identifier: doi:10.3150/bj/1077544602
Project Euclid: euclid.bj/1077544602
· Zbl 1040.62030 · doi:10.3150/bj/1077544602
[62] Yang, Y. and Barron, A. R. (1998). An asymptotic property of model selection criteria. IEEE Trans. Inform. Theory 44 95-116. Zentralblatt MATH: 0949.62041
Digital Object Identifier: doi:10.1109/18.650993
· Zbl 0949.62041 · doi:10.1109/18.650993
[63] Yang, Y. and Dunson, D. B. (2014). Minimax optimal bayesian aggregation. ArXiv preprint. Available at arXiv:1403.1345. arXiv: 1403.1345
[64] Yang, Y., Wainwright, M. J. and Jordan, M. I. (2016). On the computational complexity of high-dimensional Bayesian variable selection. Ann. Statist. 44 2497-2532. Zentralblatt MATH: 1359.62088
Digital Object Identifier: doi:10.1214/15-AOS1417
Project Euclid: euclid.aos/1479891626
· Zbl 1359.62088 · doi:10.1214/15-AOS1417
[65] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49-67. Zentralblatt MATH: 1141.62030
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00532.x
· Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.