A knockoff filter for high-dimensional selective inference. (English) Zbl 1444.62034

The authors develop a framework for testing for associations in a possibly high-dimensional linear model when the number of variables may be larger than the number of observational units. The observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables and the second is used for inference over this reduced set of variables.


62F03 Parametric hypothesis testing
62J05 Linear regression; mixed models
62H20 Measures of association (correlation, canonical correlation, etc.)
Full Text: DOI arXiv Euclid


[1] Barber, R. F. and Candès, E. J. (2015). Controlling the false discovery rate via knockoffs. Ann. Statist.43 2055-2085. · Zbl 1327.62082
[2] Barber, R. F. and Candès, E. J. (2019). Supplement to “A knockoff filter for high-dimensional selective inference.” DOI:10.1214/18-AOS1755SUPP.
[3] Belloni, A., Chernozhukov, V. and Hansen, C. (2014). Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud.81 608-650. · Zbl 1409.62142
[4] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root Lasso: Pivotal recovery of sparse signals via conic programming. Biometrika98 791-806. · Zbl 1228.62083
[5] Benjamini, Y. and Braun, H. (2002). John Tukey’s contributions to multiple comparisons. ETS Research Report Series. · Zbl 1029.01010
[6] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B57 289-300. · Zbl 0809.62014
[7] Benjamini, Y. and Yekutieli, D. (2005). False discovery rate-adjusted multiple confidence intervals for selected parameters. J. Amer. Statist. Assoc.100 71-93. · Zbl 1117.62302
[8] Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist.41 802-837. · Zbl 1267.62080
[9] Candès, E., Fan, Y., Janson, L. and Lv, J. (2018). Panning for gold: ‘Model-X’ knockoffs for high dimensional controlled variable selection. J. R. Stat. Soc. Ser. B. Stat. Methodol.80 551-577. · Zbl 1398.62335
[10] Dai, R. and Barber, R. (2016). The knockoff filter for fdr control in group-sparse and multitask regression. In Proceedings of the 33rd International Conference on Machine Learning 1851-1859.
[11] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B. Stat. Methodol.70 849-911. · Zbl 1411.62187
[12] Fithian, W., Sun, D. and Taylor, J. (2014). Optimal inference after model selection. Preprint. Available at arXiv:1410.2597.
[13] G’Sell, M. G., Hastie, T. and Tibshirani, R. (2013). False variable selection rates in regression. Preprint. Available at arXiv:1302.2303.
[14] G’Sell, M. G., Wager, S., Chouldechova, A. and Tibshirani, R. (2016). Sequential selection procedures and false discovery rate control. J. R. Stat. Soc. Ser. B. Stat. Methodol.78 423-444. · Zbl 1414.62341
[15] Gelman, A. and Tuerlinckx, F. (2000). Type S error rates for classical and Bayesian single and multiple comparison procedures. Comput. Statist.15 373-390. · Zbl 1037.62015
[16] Huang, J., Ma, S., Zhang, C.-H. and Zhou, Y. (2013). Semi-penalized inference with direct false discovery rate control in high-dimensions. Preprint. Available at arXiv:1311.7455.
[17] Janson, L., Barber, R. F. and Candès, E. (2017). EigenPrism: Inference for high dimensional signal-to-noise ratios. J. R. Stat. Soc. Ser. B. Stat. Methodol.79 1037-1065. · Zbl 1373.62355
[18] Janson, L. and Su, W. (2016). Familywise error rate control via knockoffs. Electron. J. Stat.10 960-975. Available at arXiv:1505.06549. · Zbl 1341.62245
[19] Järvelin, M.-R., Sovio, U., King, V., Lauren, L., Xu, B., McCarthy, M. I., Hartikainen, A.-L., Laitinen, J., Zitting, P. et al. (2004). Early life factors and blood pressure at age 31 years in the 1966 northern Finland birth cohort. Hypertension44 838-846.
[20] Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res.15 2869-2909. · Zbl 1319.62145
[21] Jones, L. V. and Tukey, J. W. (2000). A sensible formulation of the significance test. Psychol. Methods5 411.
[22] Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2016). Exact post-selection inference, with application to the lasso. Ann. Statist.44 907-927. Available at arXiv:1311.6238. · Zbl 1341.62061
[23] Leeb, H. and Pötscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators? Ann. Statist.34 2554-2591. · Zbl 1106.62029
[24] Lockhart, R., Taylor, J., Tibshirani, R. J. and Tibshirani, R. (2014). A significance test for the Lasso. Ann. Statist.42 413-468. · Zbl 1305.62254
[25] Miller, A. (2002). Subset Selection in Regression, 2nd ed. Monographs on Statistics and Applied Probability95. CRC Press/CRC, Boca Raton, FL. · Zbl 1051.62060
[26] Pati, Y. C., Rezaiifar, R. and Krishnaprasad, P. S. (1993). Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Proceedings of 27th Asilomar Conference on Signals, Systems and Computers 40-44. IEEE, New York.
[27] Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A. and Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet.38 904-909.
[28] Sabatti, C., Service, S. K., Hartikainen, A.-L., Pouta, A., Ripatti, S., Brodsky, J., Jones, C. G., Zaitlen, N. A., Varilo, T. et al. (2009). Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat. Genet.41 35-46.
[29] Shaffer, J. P. (2002). Multiplicity, directional (type III) errors, and the null hypothesis. Psychol. Methods7 356-369.
[30] Su, W., Bogdan, M. and Candès, E. (2017). False discoveries occur early on the Lasso path. Ann. Statist.45 2133-2150. · Zbl 1459.62142
[31] Taylor, J. T. (2017). Selective-inference. Available at https://github.com/jonathan-taylor/selective-inference.
[32] Tian, X., Loftus, J. R. and Taylor, J. E. (2018). Selective inference with unknown variance via the square-root Lasso. Biometrika105 755-768. Available at arXiv:1504.08031. · Zbl 06994533
[33] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B58 267-288. · Zbl 0850.62538
[34] Tibshirani, R. J., Taylor, J., Lockhart, R. and Tibshirani, R. (2016). Exact post-selection inference for sequential regression procedures. J. Amer. Statist. Assoc.111 600-620.
[35] Tukey, J. W. (1991). The philosophy of multiple comparisons. Statist. Sci.6 100-116.
[36] Voorman, A., Shojaie, A. and Witten, D. (2014). Inference in high dimensions with the penalized score test. Preprint. Available at arXiv:1401.2678. · Zbl 1285.62061
[37] Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist.37 2178-2201. · Zbl 1173.62054
[38] Willer, C. J., Schmidt, E. M., Sengupta, S., Peloso, G. M., Gustafsson, S., Kanoni, S., Ganna, A., Chen, J., Buchkovich, M. L. et al. (2013). Discovery and refinement of loci associated with lipid levels. Nat. Genet.45 1274-1283.
[39] Wu, J., Devlin, B., Ringquist, S., Trucco, M. and Roeder, K. (2010). Screen and clean: A tool for identifying interactions in genome-wide association studies. Genet. Epidemiol.34 275-285.
[40] Wu, Y., Boos, D. D. and Stefanski, L. A. (2007). Controlling variable selection by the addition of pseudovariables. J. Amer. Statist. Assoc.102 235-243. · Zbl 1284.62242
[41] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist.36 1567-1594. · Zbl 1142.62044
[42] Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol.76 217-242. · Zbl 1411.62196
[43] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res.7 2541-2563. · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.