×

Innovated higher criticism for detecting sparse signals in correlated noise. (English) Zbl 1189.62080

Summary: Higher criticism is a method for detecting signals that are both sparse and weak. Although first proposed in cases where the noise variables are independent, higher criticism also has reasonable performance in settings where those variables are correlated. We show that, by exploiting the nature of the correlation, performance can be improved by using a modified approach which exploits the potential advantages that correlation has to offer. Indeed, it turns out that the case of independent noise is the most difficult of all, from a statistical viewpoint, and that more accurate signal detection (for a given level of signal sparsity and strength) can be obtained when correlation is present. We characterize the advantages of correlation by showing how to incorporate them into the definition of an optimal detection boundary. The boundary has particularly attractive properties when correlation decays at a polynomial rate or the correlation matrix is Toeplitz.

MSC:

62G10 Nonparametric hypothesis testing
62M99 Inference from stochastic processes
94A12 Signal theory (characterization, reconstruction, filtering, etc.)
65C60 Computational problems in statistics (MSC2010)
62H15 Hypothesis testing in multivariate analysis
62B10 Statistical aspects of information-theoretic topics
62H20 Measures of association (correlation, canonical correlation, etc.)
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584-653. · Zbl 1092.62005
[2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. JSTOR: · Zbl 0809.62014
[3] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165-1188. · Zbl 1041.62061
[4] Bickel, P. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199-227. · Zbl 1132.62040
[5] Böttcher, A. and Silbermann, B. (1998). Introduction to Large Truncated Toeplitz Matrices . Springer, New York. · Zbl 0916.15012
[6] Brockwell, P. and Davis, R. (1991). Time Series and Methods , 2nd ed. Springer, New York. · Zbl 0709.62080
[7] Brown, B. W. and Russell, K. (1997). Methods correcting for multiple testing: Operating characteristics. Stat. Med. 16 2511-2528.
[8] Cai, T., Jin, J. and Low, M. (2007). Estimation and confidence sets for sparse normal mixtures. Ann. Statist. 35 2421-2449. · Zbl 1360.62113
[9] Cai, T. and Sun, W. (2009). Simultaneous testing of grouped hypotheses: Finding needles in multiple haystacks. J. Amer. Statist. Assoc. 104 1467-1481. · Zbl 1205.62005
[10] Cayon, L., Jin, J. and Treaster, A. (2005). Higher criticism statistic: Detecting and identifying non-Gaussianity in the WMAP first year data. Monthly Notes of the Royal Astronomical Society 362 826-832.
[11] Chen, L., Tong, T. and Zhao, H. (2005). Considering dependence among genes and markers for false discovery control in eQTL mapping. Bioinformatics 24 2015-2022.
[12] Clarke, S. and Hall, P. (2009). Robustness of multiple testing procedures against dependence. Ann. Statist. 37 332-358. · Zbl 1155.62031
[13] Cohen, A., Sackrowitz, H. B. and Xu, M. (2009). A new multiple testing method in the dependent case. Ann. Statist. 37 1518-1544. · Zbl 1161.62040
[14] Csörgö, M., Csörgö, S., Horvath, L. and Mason, D. (1986). Weighted empirical and quantile processes. Ann. Probab. 14 31-85. · Zbl 0589.60029
[15] Cover, T. M. and Thomas. J. A. (2006). Elementary Information Theory . Wiley, Hoboken, NJ. · Zbl 1140.94001
[16] Cruz, M., Cayon, L., Martínez-González, E., Vieva, P. and Jin, J. (2007). The non-Gaussian cold spot in the 3-year WMAP data. Astrophys. J. 655 11-20.
[17] Delaigle, A. and Hall, J. (2009). Higher criticism in the context of unknown distribution, non-independence and classification. In Platinum Jubilee Proceedings of the Indian Statistical Institute (N. S. Narasimha Sastry, T. S. S. R. K. Rao, M. Delampady and B. Rajeev, eds.) 109-138. World Scientific, Hackensack, NJ.
[18] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962-994. · Zbl 1092.62051
[19] Donoho, D. and Jin, J. (2006). Asymptotic minimaxity of False Discovery Rate thresholding for sparse exponential data. Ann. Statist. 34 2980-3018. · Zbl 1114.62010
[20] Donoho, D. and Jin, J. (2008). Higher criticism thresholding: Optimal feature selection when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 105 14790-14795. · Zbl 1357.62212
[21] Donoho, D. and Jin, J. (2009). Higher criticism thresholding achieves optimal phase diagram. Phil. Trans. R. Soc. A 367 4449-4470. · Zbl 1185.62113
[22] Dunnett, C. W. and Tamhane, A. C. (1995). Step-up testing of parameters with unequally correlated estimates. Biometrics 51 217-227. · Zbl 0825.62376
[23] Efron, B. (2007). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 102 93-103. · Zbl 1284.62340
[24] Finner, H. and Roters, M. (1998). Asymptotic comparison of step-down and step-up multiple test procedures based on exchangeable test statistics. Ann. Statist. 26 505-524. · Zbl 0934.62073
[25] Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035-1061. · Zbl 1092.62065
[26] Goeman, J., van de Geer, S., de Kort, F. and van Houwelingen, H. (2004). A global test for groups of genes: Testing association with a clinical outcome. Bioinformatics 20 93-99.
[27] Goeman, J., van de Geer, S. and van Houwelingen, H. (2006). Testing against a high dimensional alternative. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 477-493. · Zbl 1110.62002
[28] Gröchenig, K. and Leinert, M. (2006). Symmetry and inverse-closedness of matrix algebra and functional calculus for infinite matrices. Trans. Amer. Math. Soc. 358 2695-2711. · Zbl 1105.46032
[29] Hall, P., Pittelkow, Y. and Ghosh, M. (2008). Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 159-173. · Zbl 1400.62094
[30] Hall, P. and Jin, J. (2008). Properties of higher criticism under strong dependence. Ann. Statist. 36 381-402. · Zbl 1139.62049
[31] Horn, R. A. and Johnson, C. R. (2006). Matrix Analysis . Cambridge Univ. Press, Cambridge.
[32] Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distribution. Math. Methods Statist. 6 47-69. · Zbl 0878.62005
[33] Ingster, Y. I. (1999). Minimax detection of a signal for l p n -balls. Math. Methods Statist. 7 401-428. · Zbl 1103.62312
[34] Jaffard, S. (1990). Propriétés des matrices “bien localisées” près de leur diagonale et quelques applications. Ann. Inst. H. Poincaré Anal. Non Linéaire 7 461-476. · Zbl 0722.15004
[35] Jager, L. and Wellner, J. (2007). Goodness-of-fit tests via phi-divergences. Ann. Statist. 35 2018-2053. · Zbl 1126.62030
[36] Jin, J. (2004). Detecting a target in very noisy data from multiple looks. In A Festschrift to Honor Herman Rubin (A. Dasgupta, ed.). Institute of Mathematical Statistics Lecture Notes-Monograph Series 45 255-286. IMS, Beachwood, OH. · Zbl 1268.94013
[37] Jin, J. (2006). Higher criticism statistic: Theory and applications in non-Gaussian detection. In Statistical Problems in Particle Physics, Astrophysics And Cosmology (L. Lyons and M. K. Ünel, eds.). Imperial College Press, London.
[38] Jin, J. (2007). Proportion of nonzero normal means: Universal oracle equivalences and uniformly consistent estimators. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 461-493. · Zbl 05563355
[39] Jin, J. (2009). Impossibility of successful classification when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 106 8859-8864. · Zbl 1203.68064
[40] Jin, J. and Cai, T. (2007). Estimating the null and the proportion of non-null effects in large-scale multiple comparisons. J. Amer. Statist. Assoc. 102 496-506. · Zbl 1172.62319
[41] Kuelbs, J. and Vidyashankar, A. N. (2009). Asymptotic inference for high dimensional data. Ann. Statist. · Zbl 1243.60029
[42] Mansilla, R., de Castillo, N., Govezensky, T., Miramontes, P., José, M. and Coho, G. (2004). Long-range correlation in the whole human genome. Available at http://arxiv.org/pdf/q-bio/0402043v1.
[43] Messer, P. W. and Arndt, P. F. (2006). CorGen-measuring and generating long-range correlations for DNA sequence analysis. Nucleic Acids Research 34 W692-W695.
[44] Jin, J., Starck, J.-L., Donoho, D., Aghanim, N. and Forni, O. (2005). Cosmological non-Gaussian signature detection: Comparing performance of different statistical tests. EURASIP J. Appl. Signal Process. 15 2470-2485. · Zbl 1127.94335
[45] Meinshausen, M. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann. Statist. 34 373-393. · Zbl 1091.62059
[46] Olejnik, S., Li, J. M., Supattathum, S. and Huberty, C. J. (1997). Multiple testing and statistical power with modified Bonferroni procedures. J. Educ. Behav. Statist. 22 389-406.
[47] Rom, D. M. (1990). A sequentially rejective test procedure based on a modified Bonferroni inequality. Biometrika 77 663-665. JSTOR:
[48] Sarkar, S. K. and Chang, C. K. (1997). The Simes method for multiple hypothesis testing with positively dependent test statistics. J. Amer. Statist. Assoc. 92 1601-1608. JSTOR: · Zbl 0912.62079
[49] Shroack, G. and Wellner, J. (1986). Empirical Processes with Applications to Statistics . Wiley, New York.
[50] Strasser, H. (1998). Differentiability of statistical experiments. Statist. Decisions 16 113-130. · Zbl 0897.62004
[51] Sun, Q. (2005). Wiener’s lemma for infinite matrices with polynomial off-diagonal decay. C. R. Math. Acad. Sci. Paris 340 567-570. · Zbl 1069.42018
[52] Tukey, J. W. (1989). Higher criticism for individual significances in several tables or parts of tables. Internal working paper, Princeton Univ.
[53] Wiener, N. (1949). Extrapolation, Interpolation, and Smoothing of Stationary Time Series . Wiley, New York. · Zbl 0036.09705
[54] Wu, W. B. (2008). On false discovery control under dependence. Ann. Statist. 36 364-380. · Zbl 1139.62040
[55] Zygmund, A. (1959). Trigonometric Series , 2nd ed. Cambridge Univ. Press, New York. · Zbl 0085.05601
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.