Informative goodness-of-fit for multivariate distributions. (English) Zbl 07471512

Summary: This article discusses an informative goodness-of-fit (iGOF) approach to study multivariate distributions. When the null model is rejected, iGOF allows us to identify the underlying sources of mismodeling and naturally equips practitioners with additional insights on the nature of the deviations from the true distribution. The informative character of the procedure is achieved by exploiting smooth tests and random field theory to facilitate the analysis of multivariate data. Simulation studies show that iGOF enjoys high power for different types of alternatives. The methods presented here directly address the problem of background mismodeling arising in physics and astronomy. It is in these areas that the motivation of this work is rooted.


62H15 Hypothesis testing in multivariate analysis
62M40 Random fields; image analysis
62P35 Applications of statistics to physics


ColliderBit; LPMode; TOHM
Full Text: DOI arXiv Link


[1] Adler, R. J. (2000). On excursion sets, tube formulas and maxima of random fields. Annals of Applied Probability 1-74. · Zbl 1171.60338
[2] Algeri, S. (2019). TOHM: Testing One Hypothesis Multiple Times R package version 1.3.
[3] Algeri, S. (2020). Detecting new signals under background mismodeling. Phys. Rev. D 101 015003.
[4] Algeri, S. et al. (2018). Statistical challenges in the search for dark matter. arXiv:1807.09273.
[5] Algeri, S. and van Dyk, D. A. (2020). Testing one hypothesis multiple times: the multidimensional case. Journal of Computational and Graphical Statistics 29 358-371.
[6] Algeri, S. and Zhang, X. (2020). Exhaustive goodness-of-fit via smoothed inference and graphics. arXiv preprint arXiv:2005.13011.
[7] Aprile, E., Aalbers, J., Agostini, F., Alfonsi, M., Amaro, F., Anthony, M., Arneodo, F., Barrow, P., Baudis, L., Bauermeister, B. et al. (2017). First dark matter search results from the XENON1T experiment. Physical review letters 119 181301.
[8] Atwood et al., W. B. (2009). The Large Area Telescope on the Fermi Gamma-Ray Space Telescope Mission. The Astrophysical Journal 697 1071.
[9] Babu, G. J. and Rao, C. R. (2004). Goodness-of-fit tests when parameters are estimated. Sankhya 66 63-74. · Zbl 1192.62126
[10] Balázs, C. et al. (2017). ColliderBit: a GAMBIT module for the calculation of high-energy collider observables and likelihoods. The European Physical Journal C 77 795.
[11] Barton, D. E. (1953). On Neyman’s smooth test of goodness of fit and its power with respect to a particular system of alternatives. Scandinavian Actuarial Journal 1953 24-63. · Zbl 0053.10203
[12] Berk, R., Brown, L., Buja, A., Zhang, K., Zhao, L. et al. (2013). Valid post-selection inference. The Annals of Statistics 41 802-837. · Zbl 1267.62080
[13] Chakravarti, P., Kuusela, M., Lei, J. and Wasserman, L. (2021). Model-Independent Detection of New Physics Signals Using Interpretable Semi-Supervised Classifier Tests. arXiv preprint arXiv:2102.07679.
[14] Dauncey, P. D., Kenzie, M., Wardle, N. and Davies, G. J. (2015). Handling uncertainties in background shapes: the discrete profiling method. Journal of Instrumentation 10 P04015.
[15] Dissmann, J., Brechmann, E. C., Czado, C. and Kurowicka, D. (2013). Selecting and estimating regular vine copulae and application to financial returns. Computational Statistics & Data Analysis 59 52-69. · Zbl 1400.62114
[16] Dudley, R. M. (1967). The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. Journal of Functional Analysis 1 290-330. · Zbl 0188.20502
[17] Friedman, J., Hastie, T. and Tibshirani, R. (2001). The elements of statistical learning 1. Springer series in statistics New York. · Zbl 0973.62007
[18] Jain, N. C. and Marcus, M. B. (1975). Central limit theorems for C (S)-valued random variables. Journal of Functional Analysis 19 216-231. · Zbl 0305.60004
[19] Ledwina, T. (1994). Data-driven version of Neyman’s smooth test of fit. Journal of the American Statistical Association 89 1000-1005. · Zbl 0805.62022
[20] Lyons, L. (2013). Discovering the Significance of 5 sigma. arXiv:1310.1284.
[21] Mukhopadhyay, S. (2017). Large-scale mode identification and data-driven sciences. Electronic Journal of Statistics 11 215-240. · Zbl 1356.62052
[22] Mukhopadhyay, S. and Wang, K. (2020). Nonparametric High-dimensional K-sample Comparison. Biometrika (to appear).
[23] Nagler, T., Schellhase, C. and Czado, C. (2017). Nonparametric estimation of simplified vine copula models: comparison of methods. Dependence Modeling 5 99-120. · Zbl 1404.62034
[24] Nelder, J. (1977). A reformulation of linear models. Journal of the Royal Statistical Society: Series A (General) 140 48-63.
[25] Nelsen, R. B. (2007). An introduction to copulas. Springer Science & Business Media.
[26] Neyman, J. (1937). Smooth test for goodness of fit. Scandinavian Actuarial Journal 1937 149-199. · Zbl 0018.03403
[27] Panagiotelis, A., Czado, C., Joe, H. and Stöber, J. (2017). Model selection for discrete regular vine copulas. Computational Statistics & Data Analysis 106 138-152. · Zbl 1466.62171
[28] Parzen, E. (2004). Quantile probability and statistical data modeling. Statistical Science 19 652-662. · Zbl 1100.62500
[29] Pearson, E. S. (1938). The probability integral transformation for testing goodness of fit and combining independent tests of significance. Biometrika 30 134-148. · Zbl 0019.12803
[30] Priel, N., Rauch, L., Landsman, H., Manfredini, A. and Budnik, R. (2017). A model independent safeguard against background mismodeling for statistical inference. Journal of Cosmology and Astroparticle Physics 2017 013. · Zbl 07466495
[31] Rayner, J. C. W. and Best, D. J. (1990). Smooth tests of goodness of fit: an overview. International Statistical Review/Revue Internationale de Statistique 9-17. · Zbl 0715.62033
[32] Reed, M. and Simon, B. (1980). Methods of modern mathematical physics I: functional Analysis. · Zbl 0459.46001
[33] Rosenblatt, M. (1952). Remarks on a multivariate transformation. The annals of mathematical statistics 23 470-472. · Zbl 0047.13104
[34] Scott, P. (2018). Dark matter model comparison. BIRS Workshop on Dark matter model comparison, DM-Stat: Statistical Challenges in the Search for Dark Matter.
[35] Signoretto, M., Dinh, Q. T., De Lathauwer, L. and Suykens, J. A. (2014). Learning with tensors: a framework based on convex optimization and spectral regularization. Machine Learning 94 303-351. · Zbl 1319.68191
[36] Taylor, J. E. and Worsley, K. J. (2008). Random fields of multivariate test statistics, with applications to shape analysis. Ann. Statist. 36 1-27. · Zbl 1144.62083
[37] Taylor, J., Takemura, A., Adler, R. J. et al. (2005). Validity of the expected Euler characteristic heuristic. The Annals of Probability 33 1362-1396. · Zbl 1083.60031
[38] Thas, O. (2010). Comparing distributions. Springer.
[39] Tyson, J. A. (2002). Large synoptic survey telescope: overview. In Survey and Other Telescope Technologies and Discoveries 4836 10-20. International Society for Optics and Photonics.
[40] Westerdale, S. S. (2016). A study of nuclear recoil backgrounds in dark matter detectors, PhD thesis, Princeton University.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.