Hypothesis testing for densities and high-dimensional multinomials: sharp local minimax rates. (English) Zbl 1466.62307

Summary: We consider the goodness-of-fit testing problem of distinguishing whether the data are drawn from a specified distribution, versus a composite alternative separated from the null in the total variation metric. In the discrete case, we consider goodness-of-fit testing when the null distribution has a possibly growing or unbounded number of categories. In the continuous case, we consider testing a Hölder density with exponent \(0<s\leq 1\), with possibly unbounded support, in the low-smoothness regime where the Hölder parameter is not assumed to be constant. In contrast to existing results, we show that the minimax rate and critical testing radius in these settings depend strongly, and in a precise way, on the null distribution being tested and this motivates the study of the (local) minimax rate as a function of the null distribution. For multinomials, the local minimax rate has been established in recent work. We revisit and extend these results and develop two modifications to the \(\chi^{2}\)-test whose performance we characterize. For testing Hölder densities, we show that the usual binning tests are inadequate in the low-smoothness regime and we design a spatially adaptive partitioning scheme that forms the basis for our locally minimax optimal tests. Furthermore, we provide the first local minimax lower bounds for this problem which yield a sharp characterization of the dependence of the critical radius on the null hypothesis being tested. In the low-smoothness regime, we also provide adaptive tests that adapt to the unknown smoothness parameter. We illustrate our results with a variety of simulations that demonstrate the practical utility of our proposed tests.


62G10 Nonparametric hypothesis testing
62H15 Hypothesis testing in multivariate analysis
Full Text: DOI arXiv Euclid


[1] Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010). On combinatorial testing problems. Ann. Statist.38 3063-3092. · Zbl 1200.62059
[2] Arias-Castro, E., Candès, E. J. and Durand, A. (2011). Detection of an anomalous cluster in a network. Ann. Statist.39 278-304. · Zbl 1209.62097
[3] Arias-Castro, E., Pelletier, B. and Saligrama, V. (2018). Remember the curse of dimensionality: The case of goodness-of-fit testing in arbitrary dimension. J. Nonparametr. Stat.30 448-471. · Zbl 1402.62077
[4] Balakrishnan, S. and Wasserman, L. (2019). Supplement to “Hypothesis testing for densities and high-dimensional multinomials: Sharp local minimax rates.” DOI:10.1214/18-AOS1729SUPP.
[5] Balakrishnan, S. and Wasserman, L. (2018). Hypothesis testing for high-dimensional multinomials: A selective review. Ann. Appl. Stat. To appear. · Zbl 1405.62061
[6] Barron, A. R. (1989). Uniformly powerful goodness of fit tests. Ann. Statist.17 107-124. · Zbl 0674.62032
[7] Batu, T., Fischer, E., Fortnow, L., Kumar, R., Rubinfeld, R. and White, P. (2001). Testing random variables for independence and identity. In 42nd IEEE Symposium on Foundations of Computer Science (Las Vegas, NV, 2001) 442-451. IEEE Computer Soc., Los Alamitos, CA.
[8] Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist.41 1780-1815. · Zbl 1277.62155
[9] Cai, T. T. and Low, M. G. (2015). A framework for estimation of convex functions. Statist. Sinica25 423-456. · Zbl 06503803
[10] Carpentier, A. (2015). Testing the regularity of a smooth signal. Bernoulli21 465-488. · Zbl 1320.94021
[11] Casella, G. and Berger, R. L. (2002). Statistical Inference. Duxbury, Pacific Grove, CA. · Zbl 0699.62001
[12] Chatterjee, S. (2014). A new perspective on least squares under convex constraint. Ann. Statist.42 2340-2381. · Zbl 1302.62053
[13] Cramér, H. (1928). On the composition of elementary errors. Scand. Actuar. J.1928 13-74.
[14] Devroye, L. and Györfi, L. (1985). Nonparametric Density Estimation: The \(L_{1}\) View. Wiley, New York.
[15] Diaconis, P. and Mosteller, F. (2006). Methods for studying coincidences. In Selected Papers of Frederick Mosteller (S. E. Fienberg and D. C. Hoaglin, eds.) 605-622. Springer, New York.
[16] Diakonikolas, I. and Kane, D. M. (2016). A new approach for testing properties of discrete distributions. In 57th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2016 685-694. IEEE Computer Soc., Los Alamitos, CA.
[17] Fienberg, S. E. (1979). The use of chi-squared statistics for categorical data problems. J. Roy. Statist. Soc. Ser. B41 54-64. · Zbl 0427.62013
[18] Giné, E. and Nickl, R. (2016). Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge Series in Statistical and Probabilistic Mathematics40. Cambridge Univ. Press, New York.
[19] Goldreich, O. and Ron, D. (2011). On testing expansion in bounded-degree graphs. In Studies in Complexity and Cryptography. Lecture Notes in Computer Science6650 68-75. Springer, Heidelberg. · Zbl 1343.68302
[20] Ingster, Y. I. (1990). Minimax detection of a signal in \(\ell_{p}\)-metrics. J. Math. Sci.68 503-515. · Zbl 0836.94010
[21] Ingster, Y. I. (1997). Adaptive chi-square tests. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 244 150-166, 333. · Zbl 0949.62031
[22] Ingster, Y. I. and Suslina, I. A. (2003). Nonparametric Goodness-of-Fit Testing Under Gaussian Models. Lecture Notes in Statistics169. Springer, New York. · Zbl 1013.62049
[23] Ingster, Y. I., Tsybakov, A. B. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat.4 1476-1526. · Zbl 1329.62314
[24] LeCam, L. (1973). Convergence of estimates under dimensionality restrictions. Ann. Statist.1 38-53.
[25] Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation, 2nd ed. Springer, New York. · Zbl 0916.62017
[26] Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York. · Zbl 1076.62018
[27] Marriott, P., Sabolova, R., Van Bever, G. and Critchley, F. (2015). Geometry of goodness-of-fit testing in high dimensional low sample size modelling. In Geometric Science of Information. Lecture Notes in Computer Science9389 569-576. Springer, Cham. · Zbl 1406.62026
[28] Morris, C. (1975). Central limit theorems for multinomial sums. Ann. Statist.3 165-188. · Zbl 0305.62013
[29] Neyman, J. and Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Roy. Soc. Lond. Ser. A231 289-337. · JFM 59.1163.02
[30] Nickl, R. and van de Geer, S. (2013). Confidence sets in sparse regression. Ann. Statist.41 2852-2876. · Zbl 1288.62108
[31] Paninski, L. (2008). A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Trans. Inform. Theory54 4750-4755. · Zbl 1322.62082
[32] Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. Ser. 5 50 157-175. · JFM 31.0238.04
[33] Read, T. R. C. and Cressie, N. A. C. (1988). Goodness-of-Fit Statistics for Discrete Multivariate Data. Springer, New York. · Zbl 0663.62065
[34] Ron, D. (2008). Property testing: A learning theory perspective. Found. Trends Mach. Learn.1 307-402.
[35] Smirnoff, N. (1939). On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Moscow Univ. Math. Bull.2 3-14. · JFM 65.1356.04
[36] Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7th ed. Iowa State Univ. Press, Ames, IA. · Zbl 0727.62003
[37] Valiant, G. and Valiant, P. (2014). An automatic inequality prover and instance optimal identity testing. In 55th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2014 51-60. IEEE Computer Soc., Los Alamitos, CA. · Zbl 1362.62107
[38] von Mises, R. (1951). Wahrscheinlichkeit, Statistik und Wahrheit. Springer, Vienna. · Zbl 0043.13002
[39] Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Stat.9 60-62. · JFM 64.1211.05
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.