×

Multidimensional multiscale scanning in exponential families: limit theory and statistical consequences. (English) Zbl 1450.60028

Summary: We consider the problem of finding anomalies in a \(d\)-dimensional field of independent random variables \(\{Y_i\}_{i\in \{1,\ldots,n\}^d}\), each distributed according to a one-dimensional natural exponential family \(\mathcal{F}=\{F_{\theta }\}_{\theta \in \Theta } \). Given some baseline parameter \(\theta_0\in \Theta \), the field is scanned using local likelihood ratio tests to detect from a (large) given system of regions \(\mathcal{R}\) those regions \(R\subset \{1,\ldots,n\}^d\) with \(\theta_i\neq \theta_0\) for some \(i\in R\). We provide a unified methodology which controls the overall familywise error (FWER) to make a wrong detection at a given error rate.
Fundamental to our method is a Gaussian approximation of the distribution of the underlying multiscale test statistic with explicit rate of convergence. From this, we obtain a weak limit theorem which can be seen as a generalized weak invariance principle to nonidentically distributed data and is of independent interest. Furthermore, we give an asymptotic expansion of the procedures power, which yields minimax optimality in case of Gaussian observations.

MSC:

60F17 Functional limit theorems; invariance principles
60G60 Random fields
62F03 Parametric hypothesis testing
62H10 Multivariate distribution of statistics
62J15 Paired and multiple comparisons; multiple testing

Software:

FDRSeg
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Adler, R. J. (2000). On excursion sets, tube formulas and maxima of random fields. Ann. Appl. Probab. 10 1-74. · Zbl 1171.60338 · doi:10.1214/aoap/1019737664
[2] Alm, S. E. (1998). Approximation and simulation of the distributions of scan statistics for Poisson processes in higher dimensions. Extremes 1 111-126. · Zbl 0933.60053 · doi:10.1023/A:1009965918058
[3] Arias-Castro, E., Candès, E. J. and Durand, A. (2011). Detection of an anomalous cluster in a network. Ann. Statist. 39 278-304. · Zbl 1209.62097 · doi:10.1214/10-AOS839
[4] Arias-Castro, E., Donoho, D. L. and Huo, X. (2005). Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 2402-2425. · Zbl 1282.94014 · doi:10.1109/TIT.2005.850056
[5] Arias-Castro, E., Castro, R. M., Tánczos, E. and Wang, M. (2018). Distribution-free detection of structured anomalies: Permutation and rank-based scans. J. Amer. Statist. Assoc. 113 789-801. · Zbl 1398.62105 · doi:10.1080/01621459.2017.1286240
[6] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. · Zbl 0809.62014 · doi:10.1111/j.2517-6161.1995.tb02031.x
[7] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165-1188. · Zbl 1041.62061 · doi:10.1214/aos/1013699998
[8] Brown, L. D. (1986). Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. Institute of Mathematical Statistics Lecture Notes—Monograph Series 9. IMS, Hayward, CA. · Zbl 0685.62002
[9] Butucea, C. and Ingster, Y. I. (2013). Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli 19 2652-2688. · Zbl 1457.62072 · doi:10.3150/12-BEJ470
[10] Casella, G. and Berger, R. L. (1990). Statistical Inference. The Wadsworth & Brooks/Cole Statistics/Probability Series. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA. · Zbl 0699.62001
[11] Cheng, D. and Schwartzman, A. (2017). Multiple testing of local maxima for detection of peaks in random fields. Ann. Statist. 45 529-556. · Zbl 1369.62144 · doi:10.1214/16-AOS1458
[12] Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Gaussian approximation of suprema of empirical processes. Ann. Statist. 42 1564-1597. · Zbl 1317.60038 · doi:10.1214/14-AOS1230
[13] Datta, P. and Sen, B. (2018). Optimal inference with a multidimensional multiscale statistic. Preprint. Available at arXiv:1806.02194.
[14] Despres, C. J. (2014). The Vapnik-Chervonenkis dimension of norms on \(\mathbb{R}^d \). Preprint. Available at arXiv:1412.6612.
[15] Devroye, L. and Lugosi, G. (2001). Combinatorial Methods in Density Estimation. Springer Series in Statistics. Springer, New York. · Zbl 0964.62025
[16] Dickhaus, T. (2014). Simultaneous Statistical Inference. Springer, Heidelberg. · Zbl 1296.62062
[17] Dümbgen, L. and Spokoiny, V. G. (2001). Multiscale testing of qualitative hypotheses. Ann. Statist. 29 124-152. · Zbl 1029.62070 · doi:10.1214/aos/996986504
[18] Dümbgen, L. and Walther, G. (2008). Multiscale inference about a density. Ann. Statist. 36 1758-1785. · Zbl 1142.62336 · doi:10.1214/07-AOS521
[19] Fang, X. and Siegmund, D. (2016). Poisson approximation for two scan statistics with rates of convergence. Ann. Appl. Probab. 26 2384-2418. · Zbl 1352.60032 · doi:10.1214/15-AAP1150
[20] Farnum, N. R. and Booth, P. (1997). Uniqueness of maximum likelihood estimators of the 2-parameter Weibull distribution. IEEE Trans. Reliab. 46 523-525. https://doi.org/10.1109/24.693786.
[21] Frick, K., Munk, A. and Sieling, H. (2014). Multiscale change point inference. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 495-580. · Zbl 1411.62065 · doi:10.1111/rssb.12047
[22] Friedenberg, D. A. and Genovese, C. R. (2013). Straight to the source: Detecting aggregate objects in astronomical images with proper error control. J. Amer. Statist. Assoc. 108 456-468. · Zbl 06195952 · doi:10.1080/01621459.2013.779829
[23] Haiman, G. and Preda, C. (2006). Estimation for the distribution of two-dimensional discrete scan statistics. Methodol. Comput. Appl. Probab. 8 373-381. · Zbl 1108.62099 · doi:10.1007/s11009-006-9752-1
[24] Jiang, T. (2002). Maxima of partial sums indexed by geometrical structures. Ann. Probab. 30 1854-1892. · Zbl 1014.60024 · doi:10.1214/aop/1039548374
[25] Jiang, Y., Qiu, Y., Minn, A. J. and Zhang, N. R. (2016). Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing. Proc. Natl. Acad. Sci. USA 113 E5528-E5537.
[26] Kabluchko, Z. (2011). Extremes of the standardized Gaussian noise. Stochastic Process. Appl. 121 515-533. · Zbl 1225.60084 · doi:10.1016/j.spa.2010.11.007
[27] Kabluchko, Z. and Munk, A. (2009). Shao’s theorem on the maximum of standardized random walk increments for multidimensional arrays. ESAIM Probab. Stat. 13 409-416. · Zbl 1188.60014 · doi:10.1051/ps:2008020
[28] Kazantsev, I. G., Lemahieu, I., Salov, G. I. and Denys, R. (2002). Statistical detection of defects in radiographic images in nondestructive testing. Signal Process. 82 791-801. https://doi.org/10.1016/S0165-1684(02)00158-5. · Zbl 0995.94003 · doi:10.1016/S0165-1684(02)00158-5
[29] Komlós, J., Major, P. and Tusnády, G. (1976). An approximation of partial sums of independent RV’s, and the sample DF. II. Z. Wahrsch. Verw. Gebiete 34 33-58. · Zbl 0307.60045
[30] König, C., Munk, A. and Werner, F. (2020). Supplement to “Multidimensional multiscale scanning in exponential families: Limit theory and statistical consequences.” https://doi.org/10.1214/18-AOS1806SUPP.
[31] Kou, J. (2017). Identifying the support of rectangular signals in Gaussian noise. Preprint. Available at arXiv:1703.06226.
[32] Kulldorff, M., Heffernan, R., Hartman, J., Assunção, R. and Mostashari, F. (2005). A space-time permutation scan statistic for disease outbreak detection. PLoS Med. 2. https://doi.org/10.1371/journal.pmed.0020059.
[33] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 23. Springer, Berlin. · Zbl 0748.60004
[34] Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer Texts in Statistics. Springer, New York. · Zbl 1076.62018
[35] Li, H., Munk, A. and Sieling, H. (2016). FDR-control in multiscale change-point segmentation. Electron. J. Stat. 10 918-959. · Zbl 1338.62117 · doi:10.1214/16-EJS1131
[36] Massart, P. (1989). Strong approximation for multivariate empirical and related processes, via KMT constructions. Ann. Probab. 17 266-291. · Zbl 0675.60026 · doi:10.1214/aop/1176991508
[37] Naus, J. I. and Wallenstein, S. (2004). Multiple window and cluster size scan procedures. Methodol. Comput. Appl. Probab. 6 389-400. · Zbl 1056.62081 · doi:10.1023/B:MCAP.0000045087.33227.8c
[38] Pozdnyakov, V., Glaz, J., Kulldorff, M. and Steele, J. M. (2005). A martingale approach to scan statistics. Ann. Inst. Statist. Math. 57 21-37. · Zbl 1082.62015 · doi:10.1007/BF02506876
[39] Proksch, K., Werner, F. and Munk, A. (2018). Multiscale scanning in inverse problems. Ann. Statist. 46 3569-3602. · Zbl 1410.62064 · doi:10.1214/17-AOS1669
[40] Rio, E. (1993). Strong approximation for set-indexed partial-sum processes, via KMT constructions. II. Ann. Probab. 21 1706-1727. · Zbl 0779.60030 · doi:10.1214/aop/1176989138
[41] Rivera, C. and Walther, G. (2013). Optimal detection of a jump in the intensity of a Poisson process or in a density with likelihood ratio statistics. Scand. J. Stat. 40 752-769. · Zbl 1283.62179 · doi:10.1111/sjos.12027
[42] Schmidt-Hieber, J., Munk, A. and Dümbgen, L. (2013). Multiscale methods for shape constraints in deconvolution: Confidence statements for qualitative features. Ann. Statist. 41 1299-1328. · Zbl 1293.62104 · doi:10.1214/13-AOS1089
[43] Schwartzman, A., Gavrilov, Y. and Adler, R. J. (2011). Multiple testing of local maxima for detection of peaks in 1D. Ann. Statist. 39 3290-3319. · Zbl 1246.62173 · doi:10.1214/11-AOS943
[44] Sharpnack, J. and Arias-Castro, E. (2016). Exact asymptotics for the scan statistic and fast alternatives. Electron. J. Stat. 10 2641-2684. · Zbl 1345.62078 · doi:10.1214/16-EJS1188
[45] Šidák, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions. J. Amer. Statist. Assoc. 62 626-633. · Zbl 0158.17705
[46] Siegmund, D. and Venkatraman, E. S. (1995). Using the generalized likelihood ratio statistic for sequential detection of a change-point. Ann. Statist. 23 255-271. · Zbl 0821.62044 · doi:10.1214/aos/1176324466
[47] Siegmund, D. and Yakir, B. (2000). Tail probabilities for the null distribution of scanning statistics. Bernoulli 6 191-213. · Zbl 0976.62048 · doi:10.2307/3318574
[48] Smith, R. L. (1985). Maximum likelihood estimation in a class of nonregular cases. Biometrika 72 67-90. · Zbl 0583.62026 · doi:10.1093/biomet/72.1.67
[49] Taylor, J. E. and Worsley, K. J. (2007). Detecting sparse signals in random fields, with an application to brain mapping. J. Amer. Statist. Assoc. 102 913-928. · Zbl 1469.62353 · doi:10.1198/016214507000000815
[50] Tu, I. (2013). The maximum of a ratchet scanning process over a Poisson random field. Statist. Sinica 23 1541-1551. · Zbl 1417.62271
[51] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer Series in Statistics. Springer, New York. · Zbl 0862.60002
[52] Walther, G. (2010). Optimal and fast detection of spatial clusters with scan statistics. Ann. Statist. 38 1010-1033. · Zbl 1183.62076 · doi:10.1214/09-AOS732
[53] Zhang, N. · Zbl 1400.62300 · doi:10.1214/15-AOAS892
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.