## A significance test for the lasso.(English)Zbl 1305.62254

A linear regression model is considered, $y=X\beta^*+\varepsilon,\quad \varepsilon\sim N(0, \sigma^2I),$ where $$y\in \mathbb{R}^n$$ is an outcome vector, $$X$$ is a design matrix, and $$\beta^*\in \mathbb{R}^p$$ are unknown coefficients to be estimated. The lasso estimator $$\hat {\beta} =\hat {\beta} (\lambda)$$ minimizes the objective function $Q(\beta; \lambda)=\frac{1}{2} \|y-X\beta\|_2^2+\lambda \|\beta\|_1,\quad \beta\in \mathbb{R}^p,$ where $$\lambda \geq 0$$ is a tuning parameter, controlling the level of sparsity in $$\hat {\beta}$$. It is assumed that the columns of $$X$$ are in general position in order to ensure uniqueness of the lasso solution, see [R. J. Tibshirani, Electron. J. Stat. 7, 1456–1490 (2013; Zbl 1337.62173)].
The path $$\hat {\beta} (\lambda)$$ is a piecewise linear function, with knots at values $$\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_r \geq 0$$. At $$\lambda=\infty$$, the solution $$\hat {\beta}(\infty)$$ has no active variables, and for decreasing $$\lambda$$, each knot $$\lambda_k$$ marks the entry or removal of some variable from the current active set. At any $$\lambda \geq 0$$, the corresponding active set $$A=\operatorname{supp}(\hat {\beta}(\lambda))$$ indexes a linearly independent set of predictor variables, that is, $$\operatorname{rank}(X_A)=|A|$$, where $$X_A$$ denotes the columns of $$X$$ in $$A$$.
Let $$A$$ be the active set just before $$\lambda_k$$, and suppose that predictor $$j$$ enters at $$\lambda_k$$. Denote by $$\hat {\beta}(\lambda_{k+1})$$ the solution at point $$\lambda=\lambda_{k+1}$$, using predictors $$A$$ and $$j$$. Let $$\tilde{\beta}_A (\lambda_{k+1})$$ be the lasso solution using only the active predictors $$X_A$$, at $$\lambda=\lambda_{k+1}$$.
In the paper under review, the covariance test statistic is proposed, $T_k=\frac{1}{\sigma^2}(y, X\hat {\beta} (\lambda_{k+1})-X_A\tilde{\beta}_A (\lambda_{k+1})).$ The main result given in Theorem 3 states the following: under the null hypothesis that current lasso model contains all truly active variables, $$\operatorname{supp}(\beta^*) \subseteq A$$, $$T_k$$ is asymptotically distributed as a standard exponential random variable, given reasonable assumption on $$X$$ and the magnitudes of the nonzero true coefficients. This statistic can be used to test the significance of an additional variable between two nested models, when this additional variable is not fixed and has been chosen adaptively.
In Section 6, this result is modified for the case of unknown $$\sigma^2$$. Section 8 discusses some extensions to the elastic net, generalized linear models, and the Cox proportional hazards model; the proposals there are supported by simulations, but no theory is offered.

### MSC:

 62J07 Ridge regression; shrinkage estimators (Lasso) 62F03 Parametric hypothesis testing 62J05 Linear regression; mixed models 62J12 Generalized linear models (logistic models)

### Keywords:

lasso; least angle regression; $$p$$-value; significance test

Zbl 1337.62173

### Software:

TFOCS; NESTA; covTest; ElemStatLearn; glmnet; PDCO
Full Text:

### References:

  Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183-202. · Zbl 1175.94009  Becker, S., Bobin, J. and Candès, E. J. (2011). NESTA: A fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4 1-39. · Zbl 1209.90265  Becker, S. R., Candès, E. J. and Grant, M. C. (2011). Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3 165-218. · Zbl 1257.90042  Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternative direction method of multipliers. Faund. Trends Mach. Learn. 3 1-122. · Zbl 1229.90122  Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19 1212-1242. · Zbl 1273.62173  Candès, E. J. and Plan, Y. (2009). Near-ideal model selection by $$\ell_1$$ minimization. Ann. Statist. 37 2145-2177. · Zbl 1173.62053  Candes, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52 5406-5425. · Zbl 1309.94033  Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33-61. · Zbl 0919.94002  de Haan, L. and Ferreira, A. (2006). Extreme Value Theory : An Introduction . Springer, New York. · Zbl 1101.62002  Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289-1306. · Zbl 1288.94016  Efron, B. (1986). How biased is the apparent error rate of a prediction rule? J. Amer. Statist. Assoc. 81 461-470. · Zbl 0621.62073  Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054  Fan, J., Guo, S. and Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh-dimensional regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 37-65.  Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1-22.  Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302-332. · Zbl 1378.90064  Fuchs, J. J. (2005). Recovery of exact sparse representations in the presence of bounded noise. IEEE Trans. Inform. Theory 51 3601-3608. · Zbl 1286.94031  Grazier G’Sell, M., Taylor, J. and Tibshirani, R. (2013). Adaptive testing for the graphical lasso. Preprint. Available at .  Grazier G’Sell, M., Wager, S., Chouldechova, A. and Tibshirani, R. (2013). False discovery rate control for sequential selection procedures, with application to the lasso. Preprint. Available at .  Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971-988. · Zbl 1055.62078  Hastie, T., Tibshirani, R. and Friedman, J. (2008). The Elements of Statistical Learning ; Data Mining , Inference , and Prediction , 2nd ed. Springer, New York. · Zbl 1273.62005  Javanmard, A. and Montanari, A. (2013a). Confidence intervals and hypothesis testing for high-dimensional regression. Preprint. Available at . · Zbl 1319.62145  Javanmard, A. and Montanari, A. (2013b). Hypothesis testing in high-dimensional regression under the Gaussian random design model: Asymptotic theory. Preprint. Available at . · Zbl 1360.62074  Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 417-473.  Meinshausen, N., Meier, L. and Bühlmann, P. (2009). $$p$$-values for high-dimensional regression. J. Amer. Statist. Assoc. 104 1671-1681. · Zbl 1205.62089  Minnier, J., Tian, L. and Cai, T. (2011). A perturbation method for inference on regularized regression estimates. J. Amer. Statist. Assoc. 106 1371-1382. · Zbl 1323.62076  Osborne, M. R., Presnell, B. and Turlach, B. A. (2000a). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389-403. · Zbl 0962.65036  Osborne, M. R., Presnell, B. and Turlach, B. A. (2000b). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319-337.  Park, M. Y. and Hastie, T. (2007). $$L_1$$-regularization path algorithm for generalized linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 659-677.  Rhee, S.-Y., Gonzales, M. J., Kantor, R., Betts, B. J., Ravela, J. and Shafer, R. W. (2003). Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res. 31 298-303.  Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879-898. · Zbl 1452.62515  Taylor, J., Loftus, J. and Tibshirani, R. J. (2013). Tests in adaptive regression via the Kac-Rice formula. Preprint. Available at . · Zbl 1337.62304  Taylor, J., Takemura, A. and Adler, R. J. (2005). Validity of the expected Euler characteristic heuristic. Ann. Probab. 33 1362-1396. · Zbl 1083.60031  Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538  Tibshirani, Ryan J. (2013). The lasso problem and uniqueness. Electron. J. Stat. 7 1456-1490. · Zbl 1337.62173  Tibshirani, R. J. and Taylor, J. (2012). Degrees of freedom in lasso problems. Ann. Statist. 40 1198-1232. · Zbl 1274.62469  van de Geer, S. and Bühlmann, P. (2013). On asymptotically optimal confidence regions and tests for high-dimensional models. Preprint. Available at . · Zbl 1432.62112  Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $$\ell_1$$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183-2202. · Zbl 1367.62220  Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178-2201. · Zbl 1173.62054  Weissman, I. (1978). Estimation of parameters and large quantiles based on the $$k$$ largest observations. J. Amer. Statist. Assoc. 73 812-815. · Zbl 0397.62034  Zhang, C.-H. and Zhang, S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 76 217-242.  Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008  Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301-320. · Zbl 1069.62054  Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist. 35 2173-2192. · Zbl 1126.62061
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.