×

Minimax confidence intervals for the sliced Wasserstein distance. (English) Zbl 1493.62236

Summary: Motivated by the growing popularity of variants of the Wasserstein distance in statistics and machine learning, we study statistical inference for the Sliced Wasserstein distance – an easily computable variant of the Wasserstein distance. Specifically, we construct confidence intervals for the Sliced Wasserstein distance which have finite-sample validity under no assumptions or under mild moment assumptions. These intervals are adaptive in length to the regularity of the underlying distributions. We also bound the minimax risk of estimating the Sliced Wasserstein distance, and as a consequence establish that the lengths of our proposed confidence intervals are minimax optimal over appropriate distribution classes. To motivate the choice of these classes, we also study minimax rates of estimating a distribution under the Sliced Wasserstein distance. These theoretical findings are complemented with a simulation study demonstrating the deficiencies of the classical bootstrap, and the advantages of our proposed methods. We also show strong correspondences between our theoretical predictions and the adaptivity of our confidence interval lengths in simulations. We conclude by demonstrating the use of our confidence intervals in the setting of simulator-based likelihood-free inference. In this setting, contrasting popular approximate Bayesian computation methods, we develop uncertainty quantification methods with rigorous frequentist coverage guarantees.

MSC:

62G15 Nonparametric tolerance and confidence regions
62G05 Nonparametric estimation
62C20 Minimax procedures in statistical decision theory

Software:

Wasserstein GAN
PDFBibTeX XMLCite
Full Text: DOI arXiv Link

References:

[1] ÁLVAREZ-ESTEBAN, P. C., DEL BARRIO, E., CUESTA-ALBERTOS, J. A. and MATRAN, C. (2008). Trimmed Comparison of Distributions. Journal of the American Statistical Association 103 697-704. · Zbl 1471.62262
[2] ARJOVSKY, M., CHINTALA, S. and BOTTOU, L. (2017). Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning 214-223.
[3] BERNTON, E., JACOB, P. E., GERBER, M. and ROBERT, C. P. (2019a). Approximate Bayesian Computation with the Wasserstein Distance. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 81 235-269. · Zbl 1420.62022
[4] BERNTON, E., JACOB, P. E., GERBER, M. and ROBERT, C. P. (2019b). On Parameter Estimation with the Wasserstein Distance. Information and Inference: A Journal of the IMA 8 657-676. · Zbl 1471.62269
[5] BERTHET, P., FORT, J.-C. and KLEIN, T. (2020). A Central Limit Theorem for Wasserstein Type Distances between Two Distinct Univariate Distributions. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 56 954-982. · Zbl 1439.62116
[6] BOBKOV, S. and LEDOUX, M. (2019). One-Dimensional Empirical Measures, Order Statistics, and Kantorovich Transport Distances. Memoirs of the American Mathematical Society 261. · Zbl 1454.60007
[7] BOGACHEV, V. I. (2007). Measure Theory 1. Springer-Verlag, Berlin, Germany. · Zbl 1120.28001
[8] BOISSARD, E. and LE GOUIC, T. (2014). On the Mean Speed of Convergence of Empirical and Occupation Measures in Wasserstein Distance. In Annales de l’Institut Henri Poincaré Probabilités et Statistiques 50 539-563. · Zbl 1294.60005
[9] BONASSI, F. V. and WEST, M. (2015). Sequential Monte Carlo with Adaptive Weights for Approximate Bayesian Computation. Bayesian Analysis 10 171-187. · Zbl 1335.62015
[10] BONASSI, F. V., YOU, L. and WEST, M. (2011). Bayesian Learning from Marginal Data in Bionetwork Models. Statistical Applications in Genetics and Molecular Biology 10.
[11] BONNEEL, N., RABIN, J., PEYRÉ, G. and PFISTER, H. (2015). Sliced and Radon Wasserstein Barycenters of Measures. Journal of Mathematical Imaging and Vision 51 22-45. · Zbl 1332.94014
[12] BONNOTTE, N. (2013). Unidimensional and Evolution Methods for Optimal Transportation, PhD thesis, Paris 11.
[13] BOUCHITTÉ, G., JIMENEZ, C. and RAJESH, M. (2007). A New \[{L^{\text{\infty }}}\] Estimate in Optimal Mass Transport. Proceedings of the American Mathematical Society 135 3525-3535. · Zbl 1120.49040
[14] BOUSQUET, O., BOUCHERON, S. and LUGOSI, G. (2003). Introduction to Statistical Learning Theory. In Summer School on Machine Learning 169-207. Springer. · Zbl 1120.68428
[15] BREHMER, J., KLING, F., ESPEJO, I. and CRANMER, K. (2020). MadMiner: Machine Learning-Based Inference for Particle Physics. Computing and Software for Big Science 4 1-25.
[16] CHEN, Q. and FANG, Z. (2019). Inference on Functionals under First Order Degeneracy. Journal of Econometrics 210 459-481. · Zbl 1452.62303
[17] COURTY, N., FLAMARY, R., TUIA, D. and RAKOTOMAMONJY, A. (2016). Optimal Transport for Domain Adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 1853-1865.
[18] CSORGO, M. and REVESZ, P. (1978). Strong Approximations of the Quantile Process. The Annals of Statistics 6 882-894. · Zbl 0378.62050
[19] DALMASSO, N., IZBICKI, R. and LEE, A. B. (2020). Confidence Sets and Hypothesis Testing in a Likelihood-Free Inference Setting. In Proceedings of the 37th International Conference on Machine Learning 119 2323-2334. PMLR.
[20] DALMASSO, N., LEE, A., IZBICKI, R., POSPISIL, T., KIM, I. and LIN, C.-A. (2020). Validation of Approximate Likelihood and Emulator Models for Computationally Intensive Simulations. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics 108 3349-3361. PMLR.
[21] DEL BARRIO, E., GINÉ, E. and UTZET, F. (2005). Asymptotics for \[{L^2}\] Functionals of the Empirical Quantile Process, with Applications to Tests of Fit Based on Weighted Wasserstein Distances. Bernoulli 11 131-189. · Zbl 1063.62072
[22] DEL BARRIO, E., GONZÁLEZ-SANZ, A. and LOUBES, J.-M. (2021). Central Limit Theorems for General Transportation Costs. arXiv preprint arXiv:2102.06379.
[23] DEL BARRIO, E., GORDALIZA, P. and LOUBES, J.-M. (2019). A Central Limit Theorem for Lp Transportation Cost on the Real Line with Application to Fairness Assessment in Machine Learning. Information and Inference: A Journal of the IMA 8 817-849. · Zbl 1471.60026
[24] DEL BARRIO, E. and LOUBES, J.-M. (2019). Central Limit Theorems for Empirical Transportation Cost in General Dimension. The Annals of Probability 47 926-951. · Zbl 1466.60042
[25] DEL BARRIO, E., CUESTA-ALBERTOS, J. A., MATRÁN, C. and RODRÍGUEZ-RODRÍGUEZ, J. M. (1999). Tests of Goodness of Fit Based on the \[{L_2}\]-Wasserstein Distance. The Annals of Statistics 27 1230-1239. · Zbl 0961.62037
[26] DENKER, M. (1985). Asymptotic Distribution Theory in Nonparametric Statistics. Braunschweig-Wiesbaden: Vieweg. · Zbl 0619.62019
[27] DESHPANDE, I., HU, Y.-T., SUN, R., PYRROS, A., SIDDIQUI, N., KOYEJO, S., ZHAO, Z., FORSYTH, D. and SCHWING, A. G. (2019). Max-Sliced Wasserstein Distance and its Use for GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 10648-10656.
[28] DIACONIS, P., HOLMES, S. and SHAHSHAHANI, M. (2013). Sampling from a Manifold. In Advances in Modern Statistical Theory and Applications: A Festschrift in Honor of Morris L. Eaton 102-125. Institute of Mathematical Statistics. · Zbl 1356.62015
[29] DUDLEY, R. M. (1969). The Speed of Mean Glivenko-Cantelli Convergence. The Annals of Mathematical Statistics 40 40-50. · Zbl 0184.41401
[30] DVORETZKY, A., KIEFER, J. and WOLFOWITZ, J. (1956). Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator. The Annals of Mathematical Statistics 27 642-669. · Zbl 0073.14603
[31] EBERT, A., DUTTA, R., MENGERSEN, K., MIRA, A., RUGGERI, F. and WU, P. (2021). Likelihood-Free Parameter Estimation for Dynamic Queueing Networks: Case Study of Passenger Flow in an International Airport Terminal. Journal of the Royal Statistical Society: Series C (Applied Statistics) 70 770-792.
[32] Efron, B. and Tibshirani, R. J. (1994). An Introduction to the Bootstrap. CRC Press.
[33] FOURNIER, N. and GUILLIN, A. (2015). On the Rate of Convergence in Wasserstein Distance of the Empirical Measure. Probability Theory and Related Fields 162 707-738. · Zbl 1325.60042
[34] FREITAG, G., CZADO, C. and MUNK, A. (2007). A Nonparametric Test for Similarity of Marginals—With Applications to the Assessment of Population Bioequivalence. Journal of Statistical Planning and Inference 137 697-711. · Zbl 1111.62042
[35] FREITAG, G., MUNK, A. and VOGT, M. (2003). Assessing Structural Relationships between Distributions-a Quantile Process Approach Based on Mallows Distance. In Recent Advances and Trends in Nonparametric Statistics 123-137. Elsevier.
[36] FREITAG, G. and MUNK, A. (2005). On Hadamard Differentiability in \(k\)-Sample Semiparametric Models—with Applications to the Assessment of Structural Relationships. Journal of Multivariate Analysis 94 123-158. · Zbl 1065.62080
[37] GANGBO, W. and MCCANN, R. J. (1996). The Geometry of Optimal Transportation. Acta Mathematica 177 113-161. · Zbl 0887.49017
[38] GINÉ, E. and KOLTCHINSKII, V. (2006). Concentration Inequalities and Asymptotic Results for Ratio Type Empirical Processes. The Annals of Probability 34 1143-1216. · Zbl 1152.60021
[39] GUTMANN, M. U., DUTTA, R., KASKI, S. and CORANDER, J. (2018). Likelihood-Free Inference via Classification. Statistics and Computing 28 411-425. · Zbl 1384.62089
[40] HO, N., YANG, C.-Y. and JORDAN, M. I. (2019). Convergence Rates for Gaussian Mixtures of Experts. arXiv preprint arXiv:1907.04377.
[41] HO, N., NGUYEN, X., YUROCHKIN, M., BUI, H. H., HUYNH, V. and PHUNG, D. (2017). Multilevel Clustering via Wasserstein Means. In Proceedings of the 34th International Conference on Machine Learning 1501-1509. · Zbl 07415088
[42] HUNDRIESER, S., KLATT, M., STAUDT, T. and MUNK, A. (2022). A Unifying Approach to Distributional Limits for Empirical Optimal Transport. arXiv preprint arXiv:2202.12790.
[43] IMAIZUMI, M., OTA, H. and HAMAGUCHI, T. (2019). Hypothesis Test and Confidence Analysis with Wasserstein Distance on General Dimension. arXiv preprint arXiv:1910.07773.
[44] JIANG, B. (2018). Approximate Bayesian Computation with Kullback-Leibler Divergence as Data Discrepancy. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics 84 1711-1721. PMLR.
[45] KIM, I., BALAKRISHNAN, S. and WASSERMAN, L. (2020). Robust Multivariate Nonparametric Tests via Projection Averaging. The Annals of Statistics 48 3417-3441. · Zbl 1460.62087
[46] KLATT, M., MUNK, A. and ZEMEL, Y. (2020). Limit Laws for Empirical Optimal Solutions in Stochastic Linear Programs. arXiv preprint arXiv:2007.13473.
[47] KLATT, M., TAMELING, C. and MUNK, A. (2020). Empirical Regularized Optimal Transport: Statistical Theory and Applications. SIAM Journal on Mathematics of Data Science 2 419-443. · Zbl 1483.62055
[48] KOLOURI, S., PARK, S. R., THORPE, M., SLEPCEV, D. and ROHDE, G. K. (2017). Optimal Mass Transport: Signal Processing and Machine-Learning Applications. IEEE Signal Processing Magazine 34 43-59.
[49] KOLOURI, S., NADJAHI, K., SIMSEKLI, U., BADEAU, R. and ROHDE, G. (2019). Generalized Sliced Wasserstein Distances. In Advances in Neural Information Processing Systems 32 261-272.
[50] LE, T., YAMADA, M., FUKUMIZU, K. and CUTURI, M. (2019). Tree-Sliced Variants of Wasserstein Distances. In Advances in Neural Information Processing Systems 32 12283-12294.
[51] Lei, J. (2020). Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces. Bernoulli 26 767-798. · Zbl 1455.60009 · doi:10.3150/19-BEJ1151
[52] LIANG, T. (2019). On the Minimax Optimality of Estimating the Wasserstein Metric. arXiv preprint arXiv:1908.10324.
[53] LOTKA, A. J. (1920a). Undamped Oscillations Derived from the Law of Mass Action. Journal of the American Chemical Society 42 1595-1599.
[54] LOTKA, A. J. (1920b). Analytical Note on Certain Rhythmic Relations in Organic Systems. Proceedings of the National Academy of Sciences 6 410-415.
[55] MASSART, P. (1990). The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality. The Annals of Probability 1269-1283. · Zbl 0713.62021
[56] MUNK, A. and CZADO, C. (1998). Nonparametric Validation of Similar Distributions and Assessment of Goodness of Fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60 223-241. · Zbl 0909.62047
[57] NADJAHI, K., DE BORTOLI, V., DURMUS, A., BADEAU, R. and ŞIMŞEKLI, U. (2020). Approximate Bayesian Computation with the Sliced-Wasserstein Distance. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5470-5474.
[58] NGUYEN, X. (2013). Convergence of Latent Mixing Measures in Finite and Infinite Mixture Models. The Annals of Statistics 41 370-400. · Zbl 1347.62117
[59] NGUYEN, K., HO, N., PHAM, T. and BUI, H. (2020). Distributional Sliced-Wasserstein and Applications to Generative Modeling. In International Conference on Learning Representations.
[60] NILES-WEED, J. and RIGOLLET, P. (2022). Estimation of Wasserstein Distances in the Spiked Transport Model. To appear, Bernoulli. arXiv preprint arXiv:1909.07513.
[61] PANARETOS, V. M. and ZEMEL, Y. (2019a). An Invitation to Statistics in Wasserstein Space. Springer Nature. · Zbl 1433.62010
[62] PANARETOS, V. M. and ZEMEL, Y. (2019b). Statistical Aspects of Wasserstein Distances. Annual Review of Statistics and Its Application 6 405-431.
[63] PARK, M., JITKRITTUM, W. and SEJDINOVIC, D. (2016). K2-ABC: Approximate Bayesian Computation with Kernel Embeddings. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics 398-407.
[64] PATY, F.-P. and CUTURI, M. (2019). Subspace Robust Wasserstein Distances. In Proceedings of the 36th International Conference on Machine Learning 5072-5081.
[65] PEYRÉ, G. and CUTURI, M. (2019). Computational Optimal Transport. Foundations and Trends R◯ in Machine Learning 11 355-607.
[66] POLLARD, D. (2002). A User’s Guide to Measure Theoretic Probability 8. Cambridge University Press. · Zbl 0992.60001
[67] RABIN, J., PEYRÉ, G., DELON, J. and BERNOT, M. (2011). Wasserstein Barycenter and its Application to Texture Mixing. In International Conference on Scale Space and Variational Methods in Computer Vision 435-446. Springer.
[68] RAMDAS, A., TRILLOS, N. and CUTURI, M. (2017). On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests. Entropy 19 47.
[69] REVUZ, D. and YOR, M. (2013). Continuous Martingales and Brownian Motion 293. Springer Science & Business Media.
[70] RIPPL, T., MUNK, A. and STURM, A. (2016). Limit Laws of the Empirical Wasserstein Distance: Gaussian Distributions. Journal of Multivariate Analysis 151 90-109. · Zbl 1351.62064
[71] SHAO, J. and TU, D. (2012). The Jackknife and Bootstrap. Springer Science & Business Media.
[72] SHORACK, G. R. and WELLNER, J. A. (2009). Empirical Processes with Applications to Statistics. SIAM. · Zbl 1171.62057
[73] SINGH, S. and PÓCZOS, B. (2019). Minimax Distribution Estimation in Wasserstein Distance. arXiv preprint arXiv:1802.08855.
[74] SISSON, S. A., FAN, Y. and BEAUMONT, M. (2018). Handbook of Approximate Bayesian Computation. Chapman and Hall/CRC.
[75] SOMMERFELD, M. and MUNK, A. (2018). Inference for Empirical Wasserstein Distances on Finite Spaces. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80 219-238. · Zbl 1380.62121
[76] STEIN, E. M. and SHAKARCHI, R. (2009). Real Analysis: Measure Theory, Integration, and Hilbert Spaces. Princeton University Press.
[77] TAMELING, C., SOMMERFELD, M. and MUNK, A. (2019). Empirical Optimal Transport on Countable Metric Spaces: Distributional Limits and Statistical Applications. The Annals of Applied Probability 29 2744-2781. · Zbl 1439.60028
[78] Tsybakov, A. B. (2008). Introduction to nonparametric estimation. Springer Science & Business Media.
[79] VAN DER VAART, A. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, UK; New York, NY, USA. · Zbl 0910.62001
[80] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer. · Zbl 0862.60002
[81] VAPNIK, V. (2013). The Nature of Statistical Learning Theory. Springer Science & Business Media.
[82] VERDINELLI, I. and WASSERMAN, L. (2019). Hybrid Wasserstein Distance and Fast Distribution Clustering. Electronic Journal of Statistics 13 5088-5119. · Zbl 1435.62249
[83] VERDINELLI, I. and WASSERMAN, L. (2021). Decorrelated Variable Importance. arXiv preprint arXiv:2111.10853.
[84] VILLANI, C. (2003). Topics in Optimal Transportation. American Mathematical Soc. · Zbl 1106.90001
[85] Weed, J. and Bach, F. (2019). Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. Bernoulli 25 2620-2648. · Zbl 1428.62099 · doi:10.3150/18-BEJ1065
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.