Tight conditions for consistency of variable selection in the context of high dimensionality.(English)Zbl 1373.62154

Summary: We address the issue of variable selection in the regression model with very high ambient dimension, that is, when the number of variables is very large. The main focus is on the situation where the number of relevant variables, called intrinsic dimension, is much smaller than the ambient dimension $$d$$. Without assuming any parametric form of the underlying regression function, we get tight conditions making it possible to consistently estimate the set of relevant variables. These conditions relate the intrinsic dimension to the ambient dimension and to the sample size. The procedure that is provably consistent under these tight conditions is based on comparing quadratic functionals of the empirical Fourier coefficients with appropriately chosen threshold values. { } The asymptotic analysis reveals the presence of two quite different regimes. The first regime is when the intrinsic dimension is fixed. In this case the situation in nonparametric regression is the same as in linear regression, that is, consistent variable selection is possible if and only if $$\log d$$ is small compared to the sample size $$n$$. The picture is different in the second regime, that is, when the number of relevant variables denoted by $$s$$ tends to infinity as $$n\to\infty$$. Then we prove that consistent variable selection in nonparametric set-up is possible only if $$s+\log\log d$$ is small compared to $$\log n$$. We apply these results to derive minimax separation rates for the problem of variable selection.

MSC:

 62G08 Nonparametric regression and quantile regression 62G20 Asymptotic properties of nonparametric inference 62H12 Estimation in multivariate analysis 62J05 Linear regression; mixed models
Full Text:

References:

 [1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory ( Tsahkadsor , 1971) 267-281. Akadémiai Kiadó, Budapest. · Zbl 0283.62006 [2] Alquier, P. (2008). Iterative feature selection in least square regression estimation. Ann. Inst. Henri Poincaré Probab. Stat. 44 47-88. · Zbl 1206.62067 · doi:10.1214/07-AIHP106 [3] Bach, F. (2009). High-dimensional non-linear variable selection through hierarchical kernel learning. Technical report. Available at . [4] Bertin, K. and Lecué, G. (2008). Selection of variables and dimension reduction in high-dimensional non-parametric regression. Electron. J. Stat. 2 1224-1241. · Zbl 1320.62085 · doi:10.1214/08-EJS327 [5] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2010). Hierarchical selection of variables in sparse high-dimensional regression. In Borrowing Strength : Theory Powering Applications-a Festschrift for Lawrence D. Brown. Inst. Math. Stat. Collect. 6 56-69. IMS, Beachwood, OH. [6] Brown, L. D., Carter, A. V., Low, M. G. and Zhang, C.-H. (2004). Equivalence theory for density estimation, Poisson processes and Gaussian white noise with drift. Ann. Statist. 32 2074-2097. · Zbl 1062.62083 · doi:10.1214/009053604000000012 [7] Brown, L. D. and Low, M. G. (1996). Asymptotic equivalence of nonparametric regression and white noise. Ann. Statist. 24 2384-2398. · Zbl 0867.62022 · doi:10.1214/aos/1032181159 [8] Bunea, F. and Barbu, A. (2009). Dimension reduction and variable selection in case control studies via regularized likelihood optimization. Electron. J. Stat. 3 1257-1287. · Zbl 1326.62161 · doi:10.1214/09-EJS537 [9] Cai, T. T. and Low, M. G. (2006). Optimal adaptive estimation of a quadratic functional. Ann. Statist. 34 2298-2325. · Zbl 1110.62048 · doi:10.1214/009053606000000849 [10] Comminges, L. (2011). Conditions minimales de consistance pour la sélection de variables en grande dimension. C. R. Math. Acad. Sci. Paris 349 469-472. · Zbl 1214.62088 · doi:10.1016/j.crma.2011.02.014 [11] Comminges, L. and Dalalyan, A. (2012). Supplement to “Tight conditions for consistency of variable selection in the context of high dimensionality.” . · Zbl 1373.62154 [12] Comminges, L. and Dalalyan, A. S. (2011). Tight conditions for consistent variable selection in high dimensional nonparametric regression. J. Mach. Learn. Res. 19 187-206. [13] Dalalyan, A. and Reiß, M. (2006). Asymptotic statistical equivalence for scalar ergodic diffusions. Probab. Theory Related Fields 134 248-282. · Zbl 1081.62002 · doi:10.1007/s00440-004-0416-1 [14] Dieudonné, J. (1968). Calcul Infinitésimal . Hermann, Paris. · Zbl 0155.10001 [15] Donoho, D. and Jin, J. (2009). Feature selection by higher criticism thresholding achieves the optimal phase diagram. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 4449-4470. · Zbl 1185.62113 · doi:10.1098/rsta.2009.0129 [16] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273 [17] Fan, J. and Lv, J. (2011). Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inform. Theory 57 5467-5484. · Zbl 1365.62277 · doi:10.1109/TIT.2011.2158486 [18] Fan, J., Samworth, R. and Wu, Y. (2009). Ultrahigh dimensional feature selection: Beyond the linear model. J. Mach. Learn. Res. 10 2013-2038. · Zbl 1235.62089 [19] Gayraud, G. and Ingster, Y. (2012). Detection of sparse variable functions. Electron. J. Stat. 6 1409-1448. · Zbl 1295.62062 [20] Hebiri, M. (2010). Sparse conformal predictors. Stat. Comput. 20 253-266. · doi:10.1007/s11222-009-9167-2 [21] Huang, J. and Zhang, T. (2010). The benefit of group sparsity. Ann. Statist. 38 1978-2004. · Zbl 1202.62052 · doi:10.1214/09-AOS778 [22] Ingster, Y. and Stepanova, N. (2011). Estimation and detection of functions from anisotropic Sobolev classes. Electron. J. Stat. 5 484-506. · Zbl 1274.62319 · doi:10.1214/11-EJS615 [23] Ingster, Y. I. and Suslina, I. A. (2007). Estimation and hypothesis testing for functions from tensor products of spaces. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. ( POMI ) 351 180-218, 301-302. · Zbl 1229.62074 [24] Jenatton, R., Audibert, J.-Y. and Bach, F. (2011). Structured variable selection with sparsity-inducing norms. J. Mach. Learn. Res. 12 2777-2824. · Zbl 1202.62052 · doi:10.1214/09-AOS778 [25] Koltchinskii, V. and Yuan, M. (2010). Sparsity in multiple kernel learning. Ann. Statist. 38 3660-3695. · Zbl 1204.62086 · doi:10.1214/10-AOS825 [26] Lafferty, J. and Wasserman, L. (2008). Rodeo: Sparse, greedy nonparametric regression. Ann. Statist. 36 28-63. · Zbl 1132.62026 · doi:10.1214/009053607000000811 [27] Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1302-1338. · Zbl 1105.62328 · doi:10.1214/aos/1015957395 [28] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164-2204. · Zbl 1306.62156 · doi:10.1214/11-AOS896 [29] Mallows, C. L. (1973). Some comments on $$C_p$$. Technometrics 15 661-675. · Zbl 0269.62061 · doi:10.2307/1267380 [30] Mazo, J. E. and Odlyzko, A. M. (1990). Lattice points in high-dimensional spheres. Monatsh. Math. 110 47-61. · Zbl 0719.11063 · doi:10.1007/BF01571276 [31] Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 417-473. · doi:10.1111/j.1467-9868.2010.00740.x [32] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2011). Support union recovery in high-dimensional multivariate regression. Ann. Statist. 39 1-47. · Zbl 1373.62372 · doi:10.1214/09-AOS776 [33] Raskutti, G., Wainwright, M. J. and Yu, B. (2012). Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res. 13 389-427. · Zbl 1283.62071 [34] Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $$\ell _1$$-regularized logistic regression. Ann. Statist. 38 1287-1319. · Zbl 1189.62115 · doi:10.1214/09-AOS691 [35] Reiß, M. (2008). Asymptotic equivalence for nonparametric regression with multivariate and random design. Ann. Statist. 36 1957-1982. · Zbl 1142.62023 · doi:10.1214/07-AOS525 [36] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136 [37] Scott, J. G. and Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann. Statist. 38 2587-2619. · Zbl 1200.62020 · doi:10.1214/10-AOS792 [38] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538 [39] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation . Springer, New York. · Zbl 1176.62032 [40] Verzelen, N. (2012). Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electron. J. Stat. 6 38-90. · Zbl 1334.62120 · doi:10.1214/12-EJS666 [41] Wainwright, M. J. (2009). Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Inform. Theory 55 5728-5741. · Zbl 1367.94106 · doi:10.1109/TIT.2009.2032816 [42] Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178-2201. · Zbl 1173.62054 · doi:10.1214/08-AOS646 [43] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x [44] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894-942. · Zbl 1183.62120 · doi:10.1214/09-AOS729 [45] Zhang, T. (2009). On the consistency of feature selection using greedy least squares regression. J. Mach. Learn. Res. 10 555-568. · Zbl 1235.62096 [46] Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468-3497. · Zbl 1369.62164 · doi:10.1214/07-AOS584 [47] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.