×

Optimal detection of multi-sample aligned sparse signals. (English) Zbl 1327.62250

Summary: We describe, in the detection of multi-sample aligned sparse signals, the critical boundary separating detectable from nondetectable signals, and construct tests that achieve optimal detectability: penalized versions of the Berk-Jones and the higher-criticism test statistics evaluated over pooled scans, and an average likelihood ratio over the critical boundary. We show in our results an inter-play between the scale of the sequence length to signal length ratio, and the sparseness of the signals. In particular the difficulty of the detection problem is not noticeably affected unless this ratio grows exponentially with the number of sequences. We also recover the multiscale and sparse mixture testing problems as illustrative special cases.

MSC:

62G08 Nonparametric regression and quantile regression
62G10 Nonparametric hypothesis testing
PDF BibTeX XML Cite
Full Text: DOI arXiv Euclid

References:

[1] Arias-Castro, E., Donoho, D. L. and Huo, X. (2005). Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 2402-2425. · Zbl 1282.94014
[2] Arias-Castro, E., Donoho, D. L. and Huo, X. (2006). Adaptive multiscale detection of filamentary structures in a background of uniform random points. Ann. Statist. 34 326-349. · Zbl 1091.62095
[3] Arias-Castro, E. and Wang, M. (2013). Distribution-free tests for sparse heteroscedastic mixtures. · Zbl 1422.62159
[4] Berk, R. H. and Jones, D. H. (1979). Goodness-of-fit test statistics that dominate the Kolmogorov statistics. Z. Wahrsch. Verw. Gebiete 47 47-59. · Zbl 0379.62026
[5] Cai, T. T., Jeng, X. J. and Jin, J. (2011). Optimal detection of heterogeneous and heteroscedastic mixtures. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 629-662. · Zbl 1228.62020
[6] Cai, T. T. and Wu, Y. (2014). Optimal detection of sparse mixtures against a given null distribution. IEEE Trans. Inform. Theory 60 2217-2232. · Zbl 1360.94108
[7] The Cancer Genome Atlas (2008). Comprehensive genomic characterization defines human gliobastoma genes losses and core pathways. Nature 455 1061-1068.
[8] Chan, H. P. and Walther, G. (2013). Detection with the scan and the average likelihood ratio. Statist. Sinica 23 409-428. · Zbl 1257.62096
[9] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962-994. · Zbl 1092.62051
[10] Dümbgen, L. and Spokoiny, V. G. (2001). Multiscale testing of qualitative hypotheses. Ann. Statist. 29 124-152. · Zbl 1029.62070
[11] Efron, B. and Zhang, N. R. (2011). False discovery rates and copy number variation. Biometrika 98 251-271. · Zbl 1215.62115
[12] Glaz, J., Pozdnyakov, V. and Wallenstein, S., eds. (2009). Scan Statistics : Methods and Applications . Birkhäuser, Boston, MA. · Zbl 1165.62002
[13] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686-1732. · Zbl 1189.62080
[14] Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Statist. 6 47-69. · Zbl 0878.62005
[15] Ingster, Y. I. (1998). Minimax detection of a signal for \(l^{n}\)-balls. Math. Methods Statist. 7 401-428. · Zbl 1103.62312
[16] Jager, L. and Wellner, J. A. (2007). Goodness-of-fit tests via phi-divergences. Ann. Statist. 35 2018-2053. · Zbl 1126.62030
[17] Jeng, X. J., Cai, T. T. and Li, H. (2013). Simultaneous discovery of rare and common segment variants. Biometrika 100 157-172. · Zbl 1284.62658
[18] Lai, W. R., Johnson, M. D., Kucherlapati, R. and Park, P. J. (2005). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21 3763-3770.
[19] Lepski, O. V. and Tsybakov, A. B. (2000). Asymptotically exact nonparametric hypothesis testing in sup-norm and at a fixed point. Probab. Theory Related Fields 117 17-48. · Zbl 0971.62022
[20] Mei, Y. (2010). Efficient scalable schemes for monitoring a large number of data streams. Biometrika 97 419-433. · Zbl 1406.62088
[21] Owen, A. B. (1995). Nonparametric likelihood confidence bands for a distribution function. J. Amer. Statist. Assoc. 90 516-521. · Zbl 0925.62170
[22] Rivera, C. and Walther, G. (2013). Optimal detection of a jump in the intensity of a Poisson process or in a density with likelihood ratio statistics. Scand. J. Stat. 40 752-769. · Zbl 1283.62179
[23] Rohde, A. (2008). Adaptive goodness-of-fit tests based on signed ranks. Ann. Statist. 36 1346-1374. · Zbl 1216.62069
[24] Siegmund, D., Yakir, B. and Zhang, N. R. (2011). Detecting simultaneous variant intervals in aligned sequences. Ann. Appl. Stat. 5 645-668. · Zbl 1223.62166
[25] Tartakovsky, A. G. and Veeravalli, V. V. (2008). Asymptotically optimal quickest change detection in distributed sensor systems. Sequential Anal. 27 441-475. · Zbl 1247.93014
[26] Walther, G. (2010). Optimal and fast detection of spatial clusters with scan statistics. Ann. Statist. 38 1010-1033. · Zbl 1183.62076
[27] Walther, G. (2013). The average likelihood ratio for large-scale multiple testing and detecting sparse mixtures. In From Probability to Statistics and Back : High-Dimensional Models and Processes. Inst. Math. Stat. ( IMS ) Collect. 9 317-326. IMS, Beachwood, OH. · Zbl 1356.62095
[28] Xie, Y. and Siegmund, D. (2013). Sequential multi-sensor change-point detection. Ann. Statist. 41 670-692. · Zbl 1267.62084
[29] Zhang, N. R., Siegmund, D. O., Ji, H. and Li, J. Z. (2010). Detecting simultaneous changepoints in multiple sequences. Biometrika 97 631-645. · Zbl 1195.62168
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.