×

Properties of higher criticism under strong dependence. (English) Zbl 1139.62049

Summary: The problem of signal detection using sparse, faint information is closely related to a variety of contemporary statistical problems, including the control of false-discovery rate, and classification using very high-dimensional data. Each problem can be solved by conducting a large number of simultaneous hypothesis tests, the properties of which are readily accessed under the assumption of independence. We address the case of dependent data in the context of higher criticism methods for signal detection. Short-range dependence has no first-order impact on performance, but the situation changes dramatically under strong dependence. There, although higher criticism can continue to perform well, it can be bettered using methods based on differences of signal values or on the maximum of the data. The relatively inferior performance of higher criticism in such cases can be explained in terms of the fact that, under strong dependence, the higher criticism statistic behaves as though the data were partitioned into very large blocks, with all but a single representative of each block being eliminated from the dataset.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62G10 Nonparametric hypothesis testing
94A12 Signal theory (characterization, reconstruction, filtering, etc.)
60G15 Gaussian processes
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Anon (2005). A new method for early detection of disease outbreaks. Science Daily 23rd February. Available at http://www.sciencedaily.com/releases/2005/02/050218130731.htm.
[2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. JSTOR: · Zbl 0809.62014
[3] Bernhard, G., Klein, M. and Hommel, G. (2004). Global and multiple test procedures using ordered p -values-a review. Statist. Papers 45 1-14. · Zbl 1085.62017 · doi:10.1007/BF02778266
[4] Cai, T., Jin, J. and Low, M. (2007). Estimation and confidence sets for sparse normal mixtures. Ann. Statist. · Zbl 1360.62113 · doi:10.1214/009053607000000334
[5] Cayon, L., Jin, J. and Treaster, A. (2005). Higher criticism statistic: Detecting and identifying non-Gaussianity in the WMAP first year data. Mon. Not. Roy. Astron. Soc. 362 826-832.
[6] Delaigle, A. and Hall, P. (2007). Using thresholding methods to extend higher criticism classification to non-normal, dependent vector components. In preparation.
[7] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962-994. · Zbl 1092.62051 · doi:10.1214/009053604000000265
[8] Donoho, D. and Jin, J. (2006). Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data. Ann. Statist. 34 2980-3018. · Zbl 1114.62010 · doi:10.1214/009053606000000920
[9] Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. 18 73-103. · Zbl 1048.62099 · doi:10.1214/ss/1056397487
[10] Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151-1160. JSTOR: · Zbl 1073.62511 · doi:10.1198/016214501753382129
[11] Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist . 32 1035-1061. · Zbl 1092.62065 · doi:10.1214/009053604000000283
[12] Hall, P., Pittelkow, Y. and Ghosh, M. (2007). Relative performance of classifiers for high-dimensional data and small sample sizes. J. Roy. Statist. Soc. Ser. B. · Zbl 1400.62094
[13] Hochberg, Y. and Tahame, A. C. (1987). Multiple Comparison Procedures . Wiley, New York. · Zbl 0731.62125
[14] Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distribution. Math. Methods Statist. 6 47-69. · Zbl 0878.62005
[15] Ingster, Y. I. (1999). Minimax detection of a signal for l p n -balls. Math. Methods Statist. 7 401-428. · Zbl 1103.62312
[16] Jager, L. and Wellner, J. (2007). Goodness-of fit tests via phi-divergences. Ann. Statist. · Zbl 1126.62030 · doi:10.1214/0009053607000000244
[17] Jin, J. (2004). Detecting a target in very noisy data from multiple looks. In A Festschrift to Honor Herman Rubin . IMS Lecture Notes Monogr. Ser. 45 255-286. Inst. Math. Statist., Beachwood, OH. · Zbl 1268.94013 · doi:10.1214/lnms/1196285396
[18] Jin, J. (2007). Proportion of nonzero normal means: Universal oracle equivalences and uniformly consistent estimations. J. Roy. Statist. Soc. Ser. B.
[19] Jin, J. and Cai, T. (2006). Estimating the null and the proportion of non-null effects in large-scale multiple comparisons. J. Amer. Statist. Assoc. 102 495-506. · Zbl 1172.62319 · doi:10.1198/016214507000000167
[20] Jin, J., Starck, J.-L., Donoho, D., Aghanim, N. and Forni, O. (2005). Cosmological non-Gaussian signature detection: Comparing performance of different statistical tests. EURASIP J. Appl. Sig. Proc. 15 2470-2485. · Zbl 1127.94335 · doi:10.1155/ASP.2005.2470
[21] Jin, J., Peng, J. and Wang, P. (2007). Estimating the proportion of non-null effects, with applications to CGH data. Manuscript.
[22] Knuth, D. E. (1969). The Art of Computer Programming . 1 . Fundamental Algorithms. Addison-Wesley, Reading, MA. · Zbl 0191.18001
[23] Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses , 3rd ed. Springer, New York. · Zbl 1076.62018 · doi:10.1007/0-387-27605-X
[24] Meinshausen, M. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independent tested hypotheses. Ann. Statist. 34 373-393. · Zbl 1091.62059 · doi:10.1214/009053605000000741
[25] Pigeot, I. (2000). Basic concepts of multiple tests-A survey. Statist. Papers 41 3-36. · Zbl 0976.62002 · doi:10.1007/BF02925674
[26] Storey, J. D., Dai, J. Y. and Leek, J. T. (2005). The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. Biostatistics 8 414-432. · Zbl 1213.62175 · doi:10.1093/biostatistics/kxl019
[27] Swanepoel, J. W. H. (1999). The limiting behavior of a modified maximal symmetric 2 s -spacing with applications. Ann. Statist. 27 24-35. · Zbl 0937.62051 · doi:10.1214/aos/1018031099
[28] Tukey, J. W. (1976). T13 N: The higher criticism. Course Notes, Statistics 411, Princeton Univ.
[29] Wood, A. T. A. and Chan, G. (1994). Simulation of stationary Gaussian processes in [0, 1] d . J. Comput. Graph. Statist. 3 409-432. JSTOR: · doi:10.2307/1390903
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.