×

Adapting to unknown sparsity by controlling the false discovery rate. (English) Zbl 1092.62005

Summary: We attempt to recover an \(n\)-dimensional vector observed in white noise, where \(n\) is large and the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing power-law decay bounds on the ordered entries; and controlling the \(\ell_p\) norm for \(p\) small. We obtain a procedure which is asymptotically minimax for \(\ell^r\) loss, simultaneously throughout a range of such sparsity classes.
The optimal procedure is a data-adaptive thresholding scheme, driven by control of the false discovery rate (FDR). FDR control is a relatively recent innovation in simultaneous testing, ensuring that at most a certain expected fraction of the rejected null hypotheses will correspond to false rejections.
In our treatment, the FDR control parameter \(q_n\) also plays a determining role in asymptotic minimaxity. If \(q=\lim q_n\in [0,1/2]\) and also \(q_n> \gamma/\log(n)\), we get sharp asymptotic minimaxity, simultaneously, over a wide range of sparse parameter spaces and loss functions. On the other hand, \(q=\lim q_n\in (1/2,1]\) forces the risk to exceed the minimax risk by a factor growing with \(q\). To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new.
Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form \(2\cdot\log\)(potential model size/actual model sizes). We exhibit a close connection with FDR-controlling procedures under stringent control of the false discovery rate.

MSC:

62C20 Minimax procedures in statistical decision theory
62J15 Paired and multiple comparisons; multiple testing
62G05 Nonparametric estimation
62G32 Statistics of extreme values; tail inference

Software:

EBayesThresh

References:

[1] Abramovich, F. and Benjamini, Y. (1995). Thresholding of wavelet coefficients as multiple hypotheses testing procedure. Wavelets and Statistics . Lecture Notes in Statist. 103 5–14. Springer, Berlin. · Zbl 0875.62081
[2] Abramovich, F. and Benjamini, Y. (1996). Adaptive thresholding of wavelet coefficients. Comput. Statist. Data Anal. 22 351–361.
[3] Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. M. (2005). Adapting to unknown sparsity by controlling the false discovery rate. (Extended version.) Available at www.arXiv.org/abs/math.st/0505374. · Zbl 1092.62005 · doi:10.1214/009053606000000074
[4] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (B. N. Petrov and F. Czáki, eds.) 267–281. Akadémiai Kiadó, Budapest. · Zbl 0283.62006
[5] Benedetto, J. and Anthony, T. (1993). A wavelet auditory model and data compression. Appl. Comput. Harmon. Anal. 1 3–28. · Zbl 0794.92006 · doi:10.1006/acha.1993.1002
[6] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300. JSTOR: · Zbl 0809.62014
[7] Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. J. Educational and Behavioral Statistics 25 60–83.
[8] Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika . · Zbl 1108.62069 · doi:10.1093/biomet/93.3.491
[9] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188. · Zbl 1041.62061 · doi:10.1214/aos/1013699998
[10] Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. 3 203–268. · Zbl 1037.62001 · doi:10.1007/s100970100031
[11] Breiman, L. and Freedman, D. (1983). How many variables should be entered in a regression equation? J. Amer. Statist. Assoc. 78 131–136. JSTOR: · Zbl 0513.62068 · doi:10.2307/2287119
[12] DeVore, R. A., Jawerth, B. and Lucier, B. (1992). Image compression through wavelet transform coding. IEEE Trans. Inform. Theory 38 719–746. · Zbl 0754.68118 · doi:10.1109/18.119733
[13] Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Trans. Inform. Theory 41 613–627. · Zbl 0820.62002 · doi:10.1109/18.382009
[14] Donoho, D. L. and Johnstone, I. M. (1994). Ideal denoising in an orthonormal basis chosen from a library of bases. C. R. Acad. Sci. Paris Sér. I Math. 319 1317–1322. · Zbl 0819.94007
[15] Donoho, D. L. and Johnstone, I. M. (1994). Minimax risk over \(\ell_p\)-balls for \(\ell_q\)-error. Probab. Theory Related Fields 99 277–303. · Zbl 0802.62006 · doi:10.1007/BF01199026
[16] Donoho, D. L. and Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90 1200–1224. JSTOR: · Zbl 0869.62024 · doi:10.2307/2291512
[17] Donoho, D. L., Johnstone, I. M., Hoch, J. C. and Stern, A. S. (1992). Maximum entropy and the nearly black object (with discussion). J. Roy. Statist. Soc. Ser. B. 54 41–81. JSTOR: · Zbl 0788.62103
[18] Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: Asymptopia? (with discussion). J. Roy. Statist. Soc. Ser. B 57 301–369. JSTOR: · Zbl 0827.62035
[19] Efron, B. (1986). How biased is the apparent error rate of a prediction rule? J. Amer. Statist. Assoc. 81 461–170. JSTOR: · Zbl 0621.62073 · doi:10.2307/2289236
[20] Efron, B., Tibshirani, R. J., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160. JSTOR: · Zbl 1073.62511 · doi:10.1198/016214501753382129
[21] Field, D. J. (1987). Relation between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Amer. A 4 2379–2394.
[22] Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975. · Zbl 0829.62066 · doi:10.1214/aos/1176325766
[23] Foster, D. P. and Stine, R. A. (1999). Local asymptotic coding and the minimum description length. IEEE Trans. Inform. Theory 45 1289–1293. · Zbl 0959.62006 · doi:10.1109/18.761287
[24] Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035–1061. · Zbl 1092.62065 · doi:10.1214/009053604000000283
[25] George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731–747. JSTOR: · Zbl 1029.62008 · doi:10.1093/biomet/87.4.731
[26] Hochberg, Y. and Tamhane, A. (1987). Multiple Comparison Procedures . Wiley, New York. · Zbl 0731.62125
[27] Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75 383–386. · Zbl 0639.62025 · doi:10.1093/biomet/75.2.383
[28] Huang, J. and Mumford, D. (1999). Statistics of natural images and models. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 1 541–547.
[29] Johnstone, I. M. (1994). Minimax Bayes, asymptotic minimax and sparse wavelet priors. In Statistical Decision Theory and Related Topics V (S. Gupta and J. Berger, eds.) 303–326. Springer, New York. · Zbl 0815.62017
[30] Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Statist. 32 1594–1649. · Zbl 1047.62008 · doi:10.1214/009053604000000030
[31] Mallows, C. L. (1973). Some comments on \(C_P\). Technometrics 15 661–675. · Zbl 0269.62061 · doi:10.2307/1267380
[32] Pollard, D. (1984). Convergence of Stochastic Processes . Springer, New York. · Zbl 0544.60045
[33] Ruderman, D. L. (1994). The statistics of natural images. Network : Computation in Neural Systems 5 517–548. · Zbl 0824.92030 · doi:10.1088/0954-898X/5/4/006
[34] Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Ann. Statist. 30 239–257. · Zbl 1101.62349 · doi:10.1214/aos/1015362192
[35] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[36] Seeger, P. (1968). A note on a method for the analysis of significances en masse. Technometrics 10 586–593.
[37] Shaffer, J. (1995). Multiple hypothesis testing. Ann. Rev. Psychology 46 561–584.
[38] Simes, R. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73 751–754. JSTOR: · Zbl 0613.62067 · doi:10.1093/biomet/73.3.751
[39] Simoncelli, E. P. (1999). Bayesian denoising of visual images in the wavelet domain. Bayesian Inference in Wavelet Based Models . Lecture Notes in Statist. 141 292–308. Springer, New York. · Zbl 1115.62314
[40] Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498. JSTOR: · Zbl 1090.62073 · doi:10.1111/1467-9868.00346
[41] Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 187–205. · Zbl 1061.62110 · doi:10.1111/j.1467-9868.2004.00439.x
[42] Tibshirani, R. and Knight, K. (1999). The covariance inflation criterion for adaptive model selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 529–546. JSTOR: · Zbl 0924.62031 · doi:10.1111/1467-9868.00191
[43] van de Geer, S. (1990). Estimating a regression function. Ann. Statist. 18 907–924. · Zbl 0709.62040 · doi:10.1214/aos/1176347632
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.