×

High-dimensional regression with unknown variance. (English) Zbl 1331.62346

Summary: We review recent results for high-dimensional sparse linear regression in the practical case of unknown variance. Different sparsity settings are covered, including coordinate-sparsity, group-sparsity and variation-sparsity. The emphasis is put on nonasymptotic analyses and feasible procedures. In addition, a small numerical study compares the practical performance of three schemes for tuning the lasso estimator and some references are collected for some more general models, including multivariate regression and nonparametric regression.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62G08 Nonparametric regression and quantile regression
PDF BibTeX XML Cite
Full Text: DOI arXiv Euclid

References:

[1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory ( Tsahkadsor , 1971) 267-281. Akadémiai Kiadó, Budapest. · Zbl 0283.62006
[2] Anderson, T. W. (1951). Estimating linear restrictions on regression coefficients for multivariate normal distributions. Ann. Math. Statistics 22 327-351. · Zbl 0043.13902
[3] Antoniadis, A. (2010). Comments on: \(\ell_{1}\)-penalization for mixture regression models. TEST 19 257-258. · Zbl 1203.62124
[4] Arlot, S. and Bach, F. (2009). Data-driven calibration of linear estimators with minimal penalties. In Advances in Neural Information Processing Systems 22 (Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams and A. Culotta, eds.) 46-54. Curran Associates, New York.
[5] Arlot, S. and Celisse, A. (2010). A survey of cross-validation procedures for model selection. Stat. Surv. 4 40-79. · Zbl 1190.62080
[6] Arlot, S. and Celisse, A. (2011). Segmentation of the mean of heteroscedastic data via cross-validation. Stat. Comput. 21 613-632. · Zbl 1221.62061
[7] Arlot, S. and Massart, P. (2010). Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res. 10 245-279.
[8] Bach, F. R. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9 1179-1225. · Zbl 1225.68147
[9] Bach, F. R. (2008). Consistency of trace norm minimization. J. Mach. Learn. Res. 9 1019-1048. · Zbl 1225.68146
[10] Baraniuk, R., Davenport, M., DeVore, R. and Wakin, M. (2008). A simple proof of the restricted isometry property for random matrices. Constr. Approx. 28 253-263. · Zbl 1177.15015
[11] Baraud, Y. (2011). Estimator selection with respect to Hellinger-type risks. Probab. Theory Related Fields 151 353-401. · Zbl 1513.62062
[12] Baraud, Y., Giraud, C. and Huet, S. (2009). Gaussian model selection with an unknown variance. Ann. Statist. 37 630-672. · Zbl 1162.62051
[13] Baraud, Y., Giraud, C. and Huet, S. (2010). Estimator selection in the Gaussian setting. Available at . arXiv:1007.2096v2 · Zbl 1298.62113
[14] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301-413. · Zbl 0946.62036
[15] Baudry, J.-P., Maugis, C. and Michel, B. (2012). Slope heuristics: Overview and implementation. Statist. Comput. 22 455-470. · Zbl 1322.62007
[16] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791-806. · Zbl 1228.62083
[17] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022
[18] Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. ( JEMS ) 3 203-268. · Zbl 1037.62001
[19] Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138 33-73. · Zbl 1112.62082
[20] Bunea, F., She, Y. and Wegkamp, M. H. (2011). Optimal selection of reduced rank estimators of high-dimensional matrices. Ann. Statist. 39 1282-1309. · Zbl 1216.62086
[21] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674-1697. · Zbl 1209.62065
[22] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when \(p\) is much larger than \(n\). Ann. Statist. 35 2313-2351. · Zbl 1139.62019
[23] Cao, Y. and Golubev, Y. (2006). On oracle inequalities related to smoothing splines. Math. Methods Statist. 15 398-414.
[24] Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33-61. · Zbl 0919.94002
[25] Comte, F. and Rozenholc, Y. (2004). A new algorithm for fixed design regression and denoising. Ann. Inst. Statist. Math. 56 449-473. · Zbl 1057.62030
[26] Dalalyan, A. and Tsybakov, A. (2008). Aggregation by exponential weighting, sharp oracle inequalities and sparsity. Machine Learning 72 39-61.
[27] Devroye, L. P. and Wagner, T. J. (1979). The \(L_{1}\) convergence of kernel density estimates. Ann. Statist. 7 1136-1139. · Zbl 0423.62031
[28] Donoho, D. and Tanner, J. (2009). Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 4273-4293. · Zbl 1185.94029
[29] Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289-1306. · Zbl 1288.94016
[30] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054
[31] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547
[32] Geisser, S. (1975). The predictive sample reuse method with applications. J. Amer. Statist. Assoc. 70 320-328. · Zbl 0321.62077
[33] Gerchinovitz, S. (2011). Sparsity regret bounds for individual sequences in online linear regression. In Proceedings of COLT 2011. Microtome Publishing, Brookline, MA. · Zbl 1320.62165
[34] Giraud, C. (2008). Mixing least-squares estimators when the variance is unknown. Bernoulli 14 1089-1107. · Zbl 1168.62327
[35] Giraud, C. (2008). Estimation of Gaussian graphs by model selection. Electron. J. Stat. 2 542-563. · Zbl 1320.62094
[36] Giraud, C. (2011). Low rank multivariate regression. Electron. J. Stat. 5 775-799. · Zbl 1274.62434
[37] Giraud, C. (2011). A pseudo-RIP for multivariate regression. Available at . arXiv:1106.5599v1
[38] Giraud, C., Huet, S. and Verzelen, N. (2012). Supplement to “High-dimensional regression with unknown variance.” . · Zbl 1331.62346
[39] Giraud, C., Huet, S. and Verzelen, N. (2012). Graph selection with GGMselect. Stat. Appl. Genet. Mol. Biol. 11 1-50. · Zbl 1296.92110
[40] Guthery, S. B. (1974). A transformation theorem for one-dimensional \(F\)-expansions. J. Number Theory 6 201-210. · Zbl 0279.10040
[41] Hall, P., Kay, J. W. and Titterington, D. M. (1990). Asymptotically optimal difference-based estimation of variance in nonparametric regression. Biometrika 77 521-528. · Zbl 1377.62102
[42] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning. Data Mining , Inference , and Prediction , 2nd ed. Springer, New York. · Zbl 1273.62005
[43] Huang, J., Ma, S. and Zhang, C.-H. (2008). Adaptive Lasso for sparse high-dimensional regression models. Statist. Sinica 18 1603-1618. · Zbl 1255.62198
[44] Huang, J. and Zhang, T. (2010). The benefit of group sparsity. Ann. Statist. 38 1978-2004. · Zbl 1202.62052
[45] Huber, P. J. (1981). Robust Statistics . Wiley, New York. · Zbl 0536.62025
[46] Izenman, A. J. (1975). Reduced-rank regression for the multivariate linear model. J. Multivariate Anal. 5 248-264. · Zbl 0313.62042
[47] Ji, P. and Jin, J. (2010). UPS delivers optimal phase diagram in high dimensional variable selection. Available at . · Zbl 1246.62160
[48] Klopp, O. (2011). High dimensional matrix estimation with unknown variance of the noise. Available at . arXiv:1112.3055v1 · Zbl 1274.62489
[49] Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302-2329. · Zbl 1231.62097
[50] Lebarbier, E. (2005). Detecting multiple change-points in the mean of Gaussian process by model selection. Signal Processing 85 717-736. · Zbl 1148.94403
[51] Leng, C., Lin, Y. and Wahba, G. (2006). A note on the lasso and related procedures in model selection. Statist. Sinica 16 1273-1284. · Zbl 1109.62056
[52] Leung, G. and Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396-3410. · Zbl 1309.94051
[53] Li, K.-C. (1987). Asymptotic optimality for \(C_{p}\), \(C_{L}\), cross-validation and generalized cross-validation: Discrete index set. Ann. Statist. 15 958-975. · Zbl 0653.62037
[54] Lounici, K., Pontil, M., van de Geer, S. and Tsybakov, A. B. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164-2204. · Zbl 1306.62156
[55] Mallows, C. L. (1973). Some comments on \(C_{p}\). Technometrics 15 661-675. · Zbl 0269.62061
[56] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082
[57] Mosteller, F. and Tukey, J. W. (1968). Data analysis, including statistics. In Handbook of Social Psychology , Vol. 2 (G. Lindsey and E. Aronson, eds.). Addison-Wesley, Reading, MA.
[58] Negahban, S. and Wainwright, M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Statist. 39 1069-1097. · Zbl 1216.62090
[59] Nishii, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist. 12 758-765. · Zbl 0544.62063
[60] Park, T. and Casella, G. (2008). The Bayesian lasso. J. Amer. Statist. Assoc. 103 681-686. · Zbl 1330.62292
[61] Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over \(\ell_{q}\)-balls. IEEE Trans. Inform. Theory 57 6976-6994. · Zbl 1365.62276
[62] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731-771. · Zbl 1215.62043
[63] Rohde, A. and Tsybakov, A. B. (2011). Estimation of high-dimensional low-rank matrices. Ann. Statist. 39 887-930. · Zbl 1215.62056
[64] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005
[65] Shao, J. (1993). Linear model selection by cross-validation. J. Amer. Statist. Assoc. 88 486-494. · Zbl 0773.62051
[66] Shao, J. (1997). An asymptotic theory for linear model selection. Statist. Sinica 7 221-264. With comments and a rejoinder by the author. · Zbl 1003.62527
[67] Shibata, R. (1981). An optimal selection of regression variables. Biometrika 68 45-54. · Zbl 0464.62054
[68] Städler, N., Bühlmann, P. and van de Geer, S. (2010). \(\ell_{1}\)-penalization for mixture regression models. TEST 19 209-256. · Zbl 1203.62128
[69] Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. Ser. B 36 111-147. · Zbl 0308.62063
[70] Sun, T. and Zhang, C.-H. (2010). Comments on: \(\ell_{1}\)-penalization for mixture regression models. TEST 19 270-275. · Zbl 1203.62130
[71] Sun, T. and Zhang, C. H. (2011). Scaled sparse linear regression. Available at . arXiv:1104.4595
[72] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[73] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 91-108. · Zbl 1060.62049
[74] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360-1392. · Zbl 1327.62425
[75] Vert, J. P. and Bleakley, K. (2010). Fast detection of multiple change-points shared by many signals using group LARS. In Advances in Neural Information Processing Systems 23 (J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel and A. Culotta, eds.) 2343-2351. Curran Associates, New York.
[76] Verzelen, N. (2010). High-dimensional Gaussian model selection on a Gaussian design. Ann. Inst. H. Poincaré Probab. Stat. 46 480-524. · Zbl 1191.62076
[77] Verzelen, N. (2012). Minimax risks for sparse regressions: Ultra-high-dimensional phenomenons. Electron. J. Stat. 6 38-90. · Zbl 1334.62120
[78] Wainwright, M. J. (2009). Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Inform. Theory 55 5728-5741. · Zbl 1367.94106
[79] Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the \(\ell_{q}\) loss in \(\ell_{r}\) balls. J. Mach. Learn. Res. 11 3519-3540. · Zbl 1242.62074
[80] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030
[81] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894-942. · Zbl 1183.62120
[82] Zhang, T. (2005). Learning bounds for kernel regression using effective data dimensionality. Neural Comput. 17 2077-2098. · Zbl 1080.68044
[83] Zhang, T. (2011). Adaptive forward-backward greedy algorithm for learning sparse representations. IEEE Trans. Inform. Theory 57 4689-4708. · Zbl 1365.62288
[84] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326
[85] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301-320. · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.