×

zbMATH — the first resource for mathematics

Wide consensus aggregation in the Wasserstein space. Application to location-scatter families. (English) Zbl 1419.62118
Summary: We introduce a general theory for a consensus-based combination of estimations of probability measures. Potential applications include parallelized or distributed sampling schemes as well as variations on aggregation from resampling techniques like boosting or bagging. Taking into account the possibility of very discrepant estimations, instead of a full consensus we consider a “wide consensus” procedure. The approach is based on the consideration of trimmed barycenters in the Wasserstein space of probability measures. We provide general existence and consistency results as well as suitable properties of these robustified Fréchet means. In order to get quick applicability, we also include characterizations of barycenters of probabilities that belong to (non necessarily elliptical) location and scatter families. For these families, we provide an iterative algorithm for the effective computation of trimmed barycenters, based on a consistent algorithm for computing barycenters, guarantying applicability in a wide setting of statistical problems.

MSC:
62G35 Nonparametric robustness
60B10 Convergence of probability measures
49Q20 Variational problems in a geometric measure-theoretic setting
Software:
TCLUST
PDF BibTeX XML Cite
Full Text: DOI Euclid
References:
[1] Agueh, M. and Carlier, G. (2011). Barycenters in the Wasserstein space. SIAM J. Math. Anal. 43 904-924. · Zbl 1223.49045
[2] Álvarez-Esteban, P.C., del Barrio, E., Cuesta-Albertos, J.A. and Matrán, C. (2011). Uniqueness and approximate computation of optimal incomplete transportation plans. Ann. Inst. Henri Poincaré B , Probab. Stat. 47 358-375. · Zbl 1215.49042
[3] Álvarez-Esteban, P.C., del Barrio, E., Cuesta-Albertos, J.A. and Matrán, C. (2012). Similarity of samples and trimming. Bernoulli 18 606-634. · Zbl 1239.62005
[4] Álvarez-Esteban, P.C., del Barrio, E., Cuesta-Albertos, J.A. and Matrán, C. (2016). A fixed-point approach to barycenters in Wasserstein space. J. Math. Anal. Appl. 441 744-762. · Zbl 1383.49052
[5] Arsigny, V., Fillard, P., Pennec, X. and Ayache, N. (2006/2007). Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl. 29 328-347. · Zbl 1144.47015
[6] Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L. and Peyré, G. (2015). Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37 A1111-A1138. · Zbl 1319.49073
[7] Bigot, J. and Klein, T. (2015). Consistent estimation of a population barycenter in the Wasserstein space. Preprint. Available at arXiv:1212.2562v5 .
[8] Boissard, E., Le Gouic, T. and Loubes, J.-M. (2015). Distribution’s template estimate with Wasserstein metrics. Bernoulli 21 740-759. · Zbl 1320.62107
[9] Breiman, L. (1996). Bagging predictors. Mach. Learn. 24 123-140. · Zbl 0858.68080
[10] Brenier, Y. (1987). Polar decomposition and increasing rearrangement of vector fields. C. R. Acad. Sci. Paris Ser. I Math. 305 805-808. · Zbl 0652.26017
[11] Brenier, Y. (1991). Polar factorization and monotone rearrangement of vector-valued functions. Comm. Pure Appl. Math. 44 375-417. · Zbl 0738.46011
[12] Bühlmann, P. (2003). Bagging, subagging and bragging for improving some prediction algorithms. In Recent Advances and Trends in Nonparametric Statistics (M.G. Akritas and D.N. Politis, eds.) 19-34. Amsterdam: Elsevier.
[13] Bühlmann, P. and Yu, B. (2002). Analyzing bagging. Ann. Statist. 30 927-961. · Zbl 1029.62037
[14] Carlier, G., Oberman, A. and Oudet, E. (2015). Numerical methods for matching for teams and Wasserstein barycenters. ESAIM Math. Model. Numer. Anal. 49 1621-1642. · Zbl 1331.49042
[15] Chernozhukov, V., Galichon, A., Hallin, M. and Henry, M. (2017). Monge-Kantorovich depth, quantiles, ranks and signs. Ann. Statist. 45 223-256. · Zbl 1426.62163
[16] Croux, C. and Haesbroeck, G. (1997). An easy way to increase the finite-sample efficiency of the resampled minimum volume ellipsoid estimator. Comput. Statist. Data Anal. 25 125-141. · Zbl 0900.62278
[17] Cuesta-Albertos, J.A. and Matrán, C. (1988). The strong law of large numbers for \(k\)-means and best possible nets of Banach valued random variables. Probab. Theory Related Fields 78 523-534. · Zbl 0628.60010
[18] Cuesta, J.A. and Matrán, C. (1989). Notes on the Wasserstein metric in Hilbert spaces. Ann. Probab. 17 1264-1276. · Zbl 0688.60011
[19] Cuesta-Albertos, J.A., Matrán, C. and Mayo-Íscar, A. (2008). Trimming and likelihood: Robust location and dispersion estimation in the elliptical model. Ann. Statist. 36 2284-2318. · Zbl 1148.62038
[20] Cuesta-Albertos, J.A., Matrán-Bea, C. and Tuero-Díaz, A. (1996). On lower bounds for the \(L^{2}\)-Wasserstein metric in a Hilbert space. J. Theoret. Probab. 9 263-283. · Zbl 0870.60005
[21] Cuesta-Albertos, J.A., Matrán Bea, C. and Rodríguez Rodríguez, J.M. (2002). Shape of a distribution through the \(L_{2}\)-Wasserstein distance. In Distributions with Given Marginals and Statistical Modelling (C.M. Cuadras, J. Fortiana and J.A. Rodríguez-Lallena, eds.) 51-61. Dordrecht: Kluwer Academic. · Zbl 1135.62333
[22] Cuesta-Albertos, J.A., Rüschendorf, L. and Tuero-Díaz, A. (1993). Optimal coupling of multivariate distributions and stochastic processes. J. Multivariate Anal. 46 335-361. · Zbl 0788.60025
[23] Cuturi, M. and Doucet, A. (2014). Fast computation of Wasserstein barycenters. In Proceedings of the 31 st International Conference on Machine Learning . JMLR: W&CP vol. 32.
[24] del Barrio, E., Cuesta-Albertos, J.A., Matrán, C. and Mayo-Íscar, A. (2016). Robust clustering tools based on optimal transportation. Preprint. Available at arXiv:1607.01179 .
[25] Dudley, R.M. (1989). Real Analysis and Probability . Pacific Grove, CA: Wadsworth & Brooks. · Zbl 0686.60001
[26] Fritz, H., García-Escudero, L.A. and Mayo-Íscar, A. (2012). tclust: An R package for a trimming approach to cluster analysis. J. Stat. Softw. 47 1-26.
[27] García-Escudero, L.A., Gordaliza, A. and Matrán, C. (1999). A central limit theorem for multivariate generalized trimmed \(k\)-means. Ann. Statist. 27 1061-1079. · Zbl 0984.62042
[28] Gelbrich, M. (1990). On a formula for the \(L^{2}\) Wasserstein metric between measures on Euclidean and Hilbert spaces. Math. Nachr. 147 185-203. · Zbl 0711.60003
[29] Gordaliza, A. (1991). Best approximations to random variables based on trimming procedures. J. Approx. Theory 64 162-180. · Zbl 0745.41030
[30] Knott, M. and Smith, C.S. (1994). On a generalization of cyclic monotonicity and distances among random vectors. Linear Algebra Appl. 199 363-371. · Zbl 0796.60022
[31] Le Gouic, T. and Loubes, J.-M. (2015). Barycenter in Wasserstein spaces: Existence and consistency. Probab. Theory Related Fields . To appear. Available at hal-01163262v2 . · Zbl 1376.60009
[32] Meinshausen, N. and Bühlmann, P. (2014). Magging: maximin aggregation for inhomogeneous large-scale data. Available at arXiv:1409.2638v1 .
[33] Munk, A. and Czado, C. (1998). Nonparametric validation of similar distributions and assessment of goodness of fit. J. R. Stat. Soc. Ser. B. Stat. Methodol. 60 223-241. · Zbl 0909.62047
[34] Pass, B. (2013). Optimal transportation with infinitely many marginals. J. Funct. Anal. 264 947-963. · Zbl 1258.49073
[35] Rippl, T., Munk, A. and Sturm, A. (2016). Limit laws of the empirical Wasserstein distance: Gaussian distributions. J. Multivariate Anal. 151 90-109. · Zbl 1351.62064
[36] Rousseeuw, P. (1985). Multivariate estimation with high breakdown point. In Mathematical Statistics and Applications , Vol. B ( Bad Tatzmannsdorf , 1983) (W. Grossman, G. Pflug, I. Vincze and W. Werttz, eds.) 283-297. Dordrecht: Reidel.
[37] Rousseeuw, P.J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871-880. · Zbl 0547.62046
[38] Rousseeuw, P.J. and van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics 41 212-223.
[39] Rüschendorf, L. and Rachev, S.T. (1990). A characterization of random variables with minimum \(L^{2}\)-distance. J. Multivariate Anal. 32 48-54. · Zbl 0688.62034
[40] Rüschendorf, L. and Uckelmann, L. (2002). On the \(n\)-coupling problem. J. Multivariate Anal. 81 242-258. · Zbl 1011.62052
[41] Villani, C. (2003). Topics in Optimal Transportation. Graduate Studies in Mathematics 58 . Providence, RI: Amer. Math. Soc. · Zbl 1106.90001
[42] Villani, C. (2009). Optimal Transport : Old and New . Berlin: Springer. · Zbl 1156.53003
[43] Woodruff, D.L. and Rocke, D.M. (1994). Computable robust estimation of multivariate location and shape in high dimension using compound estimators. J. Amer. Statist. Assoc. 89 888-896. · Zbl 0825.62485
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.