×

Beyond HC: more sensitive tests for rare/weak alternatives. (English) Zbl 1471.62264

The paper under review discusses the utilization of a popular method for large-scale inference problems that is called Higher Criticism (HC). The purpose of the paper is to make the case that one should also consider the parametric methods, including the generalized likelihood ratio test (GLRT) and a related method involving the empirical moment-generating function (EMGF), as complementary methods to HC and that indeed they are closer to HC in nature than might appear at first glance. In particular, it is shown that there is another similar ‘narrow’ model to which HC is ‘tied’ in precisely the same way as the GLRT and EMGF are ‘tied’ to the normal location mixture model. The authors’ intention is not to discredit HC; it is rather to point out that the notion of being ‘tied to a narrowly specified model’ is a misplaced criticism for statistics of this type. The framework under which the theoretical properties of HC were originally developed was not detailed enough to discern any difference in performance between HC and the GLRT under the normal location mixture model. The main technical contribution of this paper is to provide a framework for higher-order power comparisons between these and related statistics, which reveals that each statistic has an edge in power under the model to which it is ‘tied’ but none is ‘uniformly better’ across all scenarios. The main technical results (on power under sparse local alternatives) are presented under the two important examples of the contamination models. A summary of simulation experiments, used to illustrate the theoretical results, is also provided.

MSC:

62F03 Parametric hypothesis testing
62F05 Asymptotic properties of parametric tests
62G30 Order statistics; empirical distribution functions
62G32 Statistics of extreme values; tail inference
62J15 Paired and multiple comparisons; multiple testing
PDFBibTeX XMLCite
Full Text: DOI Euclid

References:

[1] Arias-Castro, E. and Wang, M. (2017). Distribution-free tests for sparse heterogeneous mixtures. TEST 26 71-94. Zentralblatt MATH: 1422.62159
Digital Object Identifier: doi:10.1007/s11749-016-0499-x
· Zbl 1422.62159 · doi:10.1007/s11749-016-0499-x
[2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. Zentralblatt MATH: 0809.62014
Digital Object Identifier: doi:10.1111/j.2517-6161.1995.tb02031.x
· Zbl 0809.62014 · doi:10.1111/j.2517-6161.1995.tb02031.x
[3] Berk, R. H. and Jones, D. H. (1979). Goodness-of-fit test statistics that dominate the Kolmogorov statistics. Z. Wahrsch. Verw. Gebiete 47 47-59. Zentralblatt MATH: 0379.62026
Digital Object Identifier: doi:10.1007/BF00533250
· Zbl 0379.62026 · doi:10.1007/BF00533250
[4] Bickel, P. and Chernoff, H. (1993). Asymptotic distribution of the likelihood ratio statistic in a prototypical non regular problem. In Statistics and Probability: A Raghu Raj Bahadur Festschrift 83-96. Wiley, New York.
[5] Bickel, P. J. and Levina, E. (2004). Some theory of Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10 989-1010. Zentralblatt MATH: 1064.62073
Digital Object Identifier: doi:10.3150/bj/1106314847
Project Euclid: euclid.bj/1106314847
· Zbl 1064.62073 · doi:10.3150/bj/1106314847
[6] Cai, T. T., Jeng, X. J. and Jin, J. (2011). Optimal detection of heterogeneous and heteroscedastic mixtures. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 629-662. Zentralblatt MATH: 1228.62020
Digital Object Identifier: doi:10.1111/j.1467-9868.2011.00778.x
· Zbl 1228.62020 · doi:10.1111/j.1467-9868.2011.00778.x
[7] Cai, T. T. and Wu, Y. (2014). Optimal detection of sparse mixtures against a given null distribution. IEEE Trans. Inform. Theory 60 2217-2232. Zentralblatt MATH: 1360.94108
Digital Object Identifier: doi:10.1109/TIT.2014.2304295
· Zbl 1360.94108 · doi:10.1109/TIT.2014.2304295
[8] Chernoff, H. (1954). On the distribution of the likelihood ratio. Ann. Math. Stat. 25 573-578. Zentralblatt MATH: 0056.37102
Digital Object Identifier: doi:10.1214/aoms/1177728725
Project Euclid: euclid.aoms/1177728725
· Zbl 0056.37102 · doi:10.1214/aoms/1177728725
[9] Csörgo, M., Csörgo, S., Horváth, L. and Mason, D. M. (1986). Weighted empirical and quantile processes. Ann. Probab. 14 31-85. · Zbl 0589.60029
[10] Darling, D. A. and Erdös, P. (1956). A limit theorem for the maximum of normalized sums of independent random variables. Duke Math. J. 23 143-155. · Zbl 0070.13806
[11] Donoho, D. (2017). 50 years of data science. J. Comput. Graph. Statist. 26 745-766.
[12] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962-994. Zentralblatt MATH: 1092.62051
Digital Object Identifier: doi:10.1214/009053604000000265
Project Euclid: euclid.aos/1085408492
· Zbl 1092.62051 · doi:10.1214/009053604000000265
[13] Donoho, D. and Jin, J. (2008). Higher criticism thresholding: Optimal feature selection when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 105 14790-14795. Zentralblatt MATH: 1357.62212
Digital Object Identifier: doi:10.1073/pnas.0807471105
· Zbl 1357.62212 · doi:10.1073/pnas.0807471105
[14] Donoho, D. and Jin, J. (2015). Higher criticism for large-scale inference, especially for rare and weak effects. Statist. Sci. 30 1-25. Zentralblatt MATH: 1332.62019
Digital Object Identifier: doi:10.1214/14-STS506
Project Euclid: euclid.ss/1425492437
· Zbl 1332.62019 · doi:10.1214/14-STS506
[15] Gontscharuk, V. and Finner, H. (2017). Asymptotics of goodness-of-fit tests based on minimum \(p\)-value statistics. Comm. Statist. Theory Methods 46 2332-2342. Zentralblatt MATH: 1364.62106
Digital Object Identifier: doi:10.1080/03610926.2015.1041985
· Zbl 1364.62106 · doi:10.1080/03610926.2015.1041985
[16] Gontscharuk, V., Landwehr, S. and Finner, H. (2015). The intermediates take it all: Asymptotics of higher criticism statistics and a powerful alternative based on equal local levels. Biom. J. 57 159-180. Zentralblatt MATH: 1309.62082
Digital Object Identifier: doi:10.1002/bimj.201300255
· Zbl 1309.62082 · doi:10.1002/bimj.201300255
[17] Hall, P. (1991). On convergence rates of suprema. Probab. Theory Related Fields 89 447-455. Zentralblatt MATH: 0725.60024
Digital Object Identifier: doi:10.1007/BF01199788
· Zbl 0725.60024 · doi:10.1007/BF01199788
[18] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686-1732. Zentralblatt MATH: 1189.62080
Digital Object Identifier: doi:10.1214/09-AOS764
Project Euclid: euclid.aos/1269452652
· Zbl 1189.62080 · doi:10.1214/09-AOS764
[19] Hartigan, J. A. (1985). A failure of likelihood asymptotics for normal mixtures. In Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, Vol. II (Berkeley, Calif., 1983). Wadsworth Statist./Probab. Ser. 807-810. Wadsworth, Belmont, CA. Zentralblatt MATH: 1373.62070
· Zbl 1373.62070
[20] Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Statist. 6 47-69. Zentralblatt MATH: 0878.62005
· Zbl 0878.62005
[21] Ingster, Y. I. (2001). Adaptive detection of a signal of growing dimension. I. Math. Methods Statist. 10 395-421. Zentralblatt MATH: 1005.62051
· Zbl 1005.62051
[22] Ingster, Y. I. (2002). Adaptive detection of a signal of growing dimension. II. Math. Methods Statist. 11 37-68. Zentralblatt MATH: 1005.62052
· Zbl 1005.62052
[23] Jaeschke, D. (1979). The asymptotic distribution of the supremum of the standardized empirical distribution function on subintervals. Ann. Statist. 7 108-115. Zentralblatt MATH: 0398.62013
Digital Object Identifier: doi:10.1214/aos/1176344558
Project Euclid: euclid.aos/1176344558
· Zbl 0398.62013 · doi:10.1214/aos/1176344558
[24] Jager, L. and Wellner, J. A. (2007). Goodness-of-fit tests via phi-divergences. Ann. Statist. 35 2018-2053. Zentralblatt MATH: 1126.62030
Digital Object Identifier: doi:10.1214/0009053607000000244
Project Euclid: euclid.aos/1194461721
· Zbl 1126.62030 · doi:10.1214/0009053607000000244
[25] Jin, J. and Ke, Z. T. (2016). Rare and weak effects in large-scale inference: Methods and phase diagrams. Statist. Sinica 26 1-34. Zentralblatt MATH: 1419.62152
· Zbl 1419.62152
[26] Laurent, B., Marteau, C. and Maugis-Rabusseau, C. (2016). Non-asymptotic detection of two-component mixtures with unknown means. Bernoulli 22 242-274. Zentralblatt MATH: 1388.62131
Digital Object Identifier: doi:10.3150/14-BEJ657
Project Euclid: euclid.bj/1443620849
· Zbl 1388.62131 · doi:10.3150/14-BEJ657
[27] Leadbetter, M. R. and Rootzén, H. (1988). Extremal theory for stochastic processes. Ann. Probab. 16 431-478. · Zbl 0648.60039
[28] Li, J. and Siegmund, D. (2015). Higher criticism: \(p\)-values and criticism. Ann. Statist. 43 1323-1350. Zentralblatt MATH: 1320.62039
Digital Object Identifier: doi:10.1214/15-AOS1312
Project Euclid: euclid.aos/1431695646
· Zbl 1320.62039 · doi:10.1214/15-AOS1312
[29] Liu, X. and Shao, Y. (2004). Asymptotics for the likelihood ratio test in a two-component normal mixture model. J. Statist. Plann. Inference 123 61-81. Zentralblatt MATH: 1050.62025
Digital Object Identifier: doi:10.1016/S0378-3758(03)00138-1
· Zbl 1050.62025 · doi:10.1016/S0378-3758(03)00138-1
[30] Moscovich, A., Nadler, B. and Spiegelman, C. (2016). On the exact Berk-Jones statistics and their \(p\)-value calculation. Electron. J. Stat. 10 2329-2354. Zentralblatt MATH: 1346.62092
Digital Object Identifier: doi:10.1214/16-EJS1172
· Zbl 1346.62092 · doi:10.1214/16-EJS1172
[31] Porter, T. and Stewart, M. (2020). Supplement to “Beyond HC: More sensitive tests for rare/weak alternatives.” https://doi.org/10.1214/19-AOS1885SUPP.
[32] Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Applications to Statistics. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. Wiley, New York. Zentralblatt MATH: 1170.62365
· Zbl 1170.62365
[33] Stepanova, N. and Pavlenko, T. (2018). Goodness-of-fit tests based on sup-functionals of weighted empirical processes. Teor. Veroyatn. Primen. 63 358-388. Zentralblatt MATH: 1404.62046
Digital Object Identifier: doi:10.1137/S0040585X97T989052
· Zbl 1404.62046 · doi:10.1137/S0040585X97T989052
[34] Tukey, J. W. (1976). T13 N: The higher criticism. Technical report.
[35] Tukey, J. W. (1989). Higher criticism for individual significances in several tables or parts of tables. Technical report.
[36] Tukey, J. W. (1994). The problem of multiple comparisons. In The Collected Works of John W. Tukey. Vol. VIII lxii \(+475+\) i10. CRC Press, New York. Zentralblatt MATH: 0807.01035
· Zbl 0807.01035
[37] Walther, G. (2013). The average likelihood ratio for large-scale multiple testing and detecting sparse mixtures. In From Probability to Statistics and Back: High-Dimensional Models and Processes. Inst. Math. Stat. (IMS) Collect. 9 317-326. IMS, Beachwood, OH. Zentralblatt MATH: 1356.62095
· Zbl 1356.62095
[38] Wellner, J. · Zbl 1042.62009
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.