×

Distribution and quantile functions, ranks and signs in dimension \(d\): a measure transportation approach. (English) Zbl 1468.62282

Summary: Unlike the real line, the real space \({\mathbb{R}^d} \), for \(d\ge 2\), is not canonically ordered. As a consequence, such fundamental univariate concepts as quantile and distribution functions and their empirical counterparts, involving ranks and signs, do not canonically extend to the multivariate context. Palliating that lack of a canonical ordering has been an open problem for more than half a century, generating an abundant literature and motivating, among others, the development of statistical depth and copula-based methods. We show that, unlike the many definitions proposed in the literature, the measure transportation-based ranks and signs introduced in [V. Chernozhukov et al., Ann. Stat. 45, No. 1, 223–256 (2017; Zbl 1426.62163)] enjoy all the properties that make univariate ranks a successful tool for semiparametric inference. Related with those ranks, we propose a new center-outward definition of multivariate distribution and quantile functions, along with their empirical counterparts, for which we establish a Glivenko-Cantelli result. Our approach is based on [R. J. McCann, Duke Math. J. 80, No. 2, 309–323 (1995; Zbl 0873.28009)] and our results do not require any moment assumptions. The resulting ranks and signs are shown to be strictly distribution-free and essentially maximal ancillary in the sense of D. Basu [Sankhyā 21, 247–256 (1959; Zbl 0091.14803)] which, in semiparametric models involving noise with unspecified density, can be interpreted as a finite-sample form of semiparametric efficiency. Although constituting a sufficient summary of the sample, empirical center-outward distribution functions are defined at observed values only. A continuous extension to the entire \(d\)-dimensional space, yielding smooth empirical quantile contours and sign curves while preserving the essential monotonicity and Glivenko-Cantelli features of the concept, is provided. A numerical study of the resulting empirical quantile contours is conducted.

MSC:

62G30 Order statistics; empirical distribution functions
62G08 Nonparametric regression and quantile regression
62H11 Directional data; spatial statistics
62B05 Sufficient statistics and fields
28D05 Measure-preserving transformations

Software:

alphahull; MNM
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ahuja, R. K., Magnanti, T. L. and Orlin, J. B. (1993). Network Flows: Theory, Algorithms, and Applications. Prentice Hall, Englewood Cliffs, NJ. · Zbl 1201.90001
[2] Basu, D. (1955). On statistics independent of a complete sufficient statistic. Sankhyā 15 377-380. · Zbl 0068.13401 · doi:10.1007/978-1-4419-5825-9_14
[3] Basu, D. (1959). The family of ancillary statistics. Sankhyā 21 247-256. · Zbl 0091.14803 · doi:10.1007/978-1-4419-5825-9_18
[4] Beirlant, J., del Barrio, E., Buitendag, S., Hallin, M. and Kamper, Fr. (2020). Center-outward quantiles and the measurement of multivariate risk. Insurance Math. Econom. 95 79-100. · Zbl 1452.91074
[5] Belloni, A. and Winkler, R. L. (2011). On multivariate quantiles under partial orders. Ann. Statist. 39 1125-1179. · Zbl 1216.62082 · doi:10.1214/10-AOS863
[6] Bertsekas, D. P. (1991). Linear Network Optimization: Algorithms and Codes. MIT Press, Cambridge, MA. · Zbl 0754.90059
[7] Bickel, P. J. (1965). On some asymptotically nonparametric competitors of Hotelling’s \[{T^2}\]. Ann. Math. Stat. 36 160-173; correction, ibid. 1583. · Zbl 0138.13205
[8] Biswas, M., Mukhopadhyay, M. and Ghosh, A. K. (2014). A distribution-free two-sample run test applicable to high-dimensional data. Biometrika 101 913-926. · Zbl 1306.62122 · doi:10.1093/biomet/asu045
[9] Boeckel, M., Spokoiny, V. and Suvorikova, A. (2018). Multivariate Brenier cumulative distribution functions and their application to nonparametric testing. Available at arXiv:1809.04090.
[10] Burkard, R., Dell’Amico, M. and Martello, S. (2009). Assignment Problems. SIAM, Philadelphia, PA. · Zbl 1196.90002 · doi:10.1137/1.9780898717754
[11] Chakraborty, B. and Chaudhuri, P. (1996). On a transformation and re-transformation technique for constructing an affine equivariant multivariate median. Proc. Amer. Math. Soc. 124 2539-2547. · Zbl 0856.62046 · doi:10.1090/S0002-9939-96-03657-X
[12] Chakraborty, B. and Chaudhuri, P. (1998). On an adaptive transformation-retransformation estimate of multivariate location. J. R. Stat. Soc. Ser. B. Stat. Methodol. 60 145-157. · Zbl 0909.62056 · doi:10.1111/1467-9868.00114
[13] Chakraborty, A. and Chaudhuri, P. (2014). The spatial distribution in infinite dimensional spaces and related quantiles and depths. Ann. Statist. 42 1203-1231. · Zbl 1305.62141 · doi:10.1214/14-AOS1226
[14] Carlier, G., Chernozhukov, V. and Galichon, A. (2016). Vector quantile regression: an optimal transport approach. Ann. Statist. 44 1165-1192. · Zbl 1381.62239
[15] Chakraborty, A. and Chaudhuri, P. (2017). Tests for high-dimensional data based on means, spatial signs and spatial ranks. Ann. Statist. 45 771-799. · Zbl 1368.62147 · doi:10.1214/16-AOS1467
[16] Chaudhuri, P. (1996). On a geometric notion of quantiles for multivariate data. J. Amer. Statist. Assoc. 91 862-872. · Zbl 0869.62040 · doi:10.2307/2291681
[17] Chaudhuri, P. and Sengupta, D. (1993). Sign tests in multidimension: Inference based on the geometry of the data cloud. J. Amer. Statist. Assoc. 88 1363-1370. · Zbl 0792.62047
[18] Chernozhukov, V., Galichon, A., Hallin, M. and Henry, M. (2017). Monge-Kantorovich depth, quantiles, ranks and signs. Ann. Statist. 45 223-256. · Zbl 1426.62163 · doi:10.1214/16-AOS1450
[19] Choi, K. and Marden, J. (1997). An approach to multivariate rank tests in multivariate analysis of variance. J. Amer. Statist. Assoc. 92 1581-1590. · Zbl 0912.62065 · doi:10.2307/2965429
[20] Cuesta-Albertos, J. A., Matrán, C. and Tuero-Díaz, A. (1997). Optimal transportation plans and convergence in distribution. J. Multivariate Anal. 60 72-83. · Zbl 0894.60012 · doi:10.1006/jmva.1996.1627
[21] De Valk, C. and Segers, J. (2018). Stability and tail limits of transport-based quantile contours. Available at arXiv:1811.12061.
[22] Deb, N. and Sen, B. (2019). Multivariate rank-based distribution-free nonparametric testing using measure transportation. Available at arXiv:1909.08733.
[23] del Barrio, E., González-Sanz, A. and Hallin, M. (2020). A note on the regularity of center-outward distribution and quantile functions. J. Multivariate Anal. 180 S0047259X20302529. · Zbl 1450.62047
[24] del Barrio, E. and Loubes, J.-M. (2019). Central limit theorems for empirical transportation cost in general dimension. Ann. Probab. 47 926-951. · Zbl 1466.60042 · doi:10.1214/18-AOP1275
[25] Dick, J. and Pillichshammer, F. (2014). Discrepancy theory and quasi-Monte Carlo integration. In A Panorama of Discrepancy Theory (W. Chen, A. Srivastava and G. Travaglini, eds.). Lecture Notes in Math. 2107 539-619. Springer, Cham. · Zbl 1358.11086 · doi:10.1007/978-3-319-04696-9_9
[26] Figalli, A. (2017). The Monge-Ampère Equation and Its Applications. Zurich Lectures in Advanced Mathematics. European Mathematical Society (EMS), Zürich. · Zbl 1435.35003 · doi:10.4171/170
[27] Figalli, A. (2018). On the continuity of center-outward distribution and quantile functions. Nonlinear Anal. 177 413-421. · Zbl 1433.62132 · doi:10.1016/j.na.2018.05.008
[28] Genest, C. and Rivest, L.-P. (2001). On the multivariate probability integral transformation. Statist. Probab. Lett. 53 391-399. · Zbl 0982.62056 · doi:10.1016/S0167-7152(01)00047-5
[29] Ghosal, P. and Sen, B. (2019). Multivariate ranks and quantiles using optimal transportation and applications to goodness-of-fit testing. Available at arXiv:1905.05340.
[30] Gushchin, A. A. and Borzykh, D. A. (2017). Integrated quantile functions: Properties and applications. Mod. Stoch. Theory Appl. 4 285-314. · Zbl 1383.60019 · doi:10.15559/17-vmsta88
[31] Hallin, M. (2017). On distribution and quantile functions, ranks, and signs in \[{\mathbb{R}^d} \]. Available at https://ideas.repec.org/p/eca/wpaper/2013-258262.html.
[32] del Barrio, E., Cuesta-Albertos, J., Hallin, M. and Matrán, C. (2018). Smooth cyclically monotone interpolation and empirical center-outward distribution functions. Available at arXiv:1806.01238.
[33] Hallin, M., Hlubinka, D. and Hudecová, S. (2020). Fully distribution-free center-outward rank tests for multiple-output regression and MANOVA. Available at arXiv:2007.15496.
[34] Hallin, M., Ingenbleek, J.-F. and Puri, M. L. (1989). Asymptotically most powerful rank tests for multivariate randomness against serial dependence. J. Multivariate Anal. 30 34-71. · Zbl 0685.62047 · doi:10.1016/0047-259X(89)90087-0
[35] Hallin, M., Mordant, G. and Segers, J. (2019). Multivariate goodness-of-fit tests based on Wasserstein distance. Available at arXiv:2003.06684.
[36] Hallin, M., La Vecchia, D. and Liu, H. (2020). Center-outward R-estimation for semiparametric VARMA models. J. Amer. Statist. Assoc. · Zbl 1507.62286 · doi:10.1080/01621459.2020.1832501
[37] Hallin, M., La Vecchia, D. and Liu, H. (2020). Rank-based testing for semiparametric VAR models: a measure transportation approach. Available at arXiv:2011.06062.
[38] Hallin, M., Lu, Z., Paindaveine, D. and Šiman, M. (2015). Local bilinear multiple-output quantile regression. Bernoulli 21 1435-1466. · Zbl 1388.62109
[39] Hallin, M. and Mehta, C. (2015). \(R\)-estimation for asymmetric independent component analysis. J. Amer. Statist. Assoc. 110 218-232. · Zbl 1381.62145 · doi:10.1080/01621459.2014.909316
[40] Hallin, M., Oja, H. and Paindaveine, D. (2006). Semiparametrically efficient rank-based inference for shape. II. Optimal \(R\)-estimation of shape. Ann. Statist. 34 2757-2789. · Zbl 1115.62059 · doi:10.1214/009053606000000948
[41] Hallin, M. and Paindaveine, D. (2002a). Optimal tests for multivariate location based on interdirections and pseudo-Mahalanobis ranks. Ann. Statist. 30 1103-1133. · Zbl 1101.62348 · doi:10.1214/aos/1031689019
[42] Hallin, M. and Paindaveine, D. (2002b). Optimal procedures based on interdirections and pseudo-Mahalanobis ranks for testing multivariate elliptic white noise against ARMA dependence. Bernoulli 8 787-815. · Zbl 1018.62046
[43] Hallin, M. and Paindaveine, D. (2002c). Multivariate signed ranks: Randles’ interdirections or Tyler’s angles? In Statistical Data Analysis Based on the \[{L_1} \]-Norm and Related Methods (Neuchâtel, 2002). Stat. Ind. Technol. 271-282. Birkhäuser, Basel. · Zbl 1145.62339
[44] Hallin, M. and Paindaveine, D. (2004a). Rank-based optimal tests of the adequacy of an elliptic VARMA model. Ann. Statist. 32 2642-2678. · Zbl 1076.62044 · doi:10.1214/009053604000000724
[45] Hallin, M. and Paindaveine, D. (2004b). Multivariate signed-rank tests in vector autoregressive order identification. Statist. Sci. 19 697-711. · Zbl 1100.62577 · doi:10.1214/088342304000000602
[46] Hallin, M. and Paindaveine, D. (2005). Affine-invariant aligned rank tests for the multivariate general linear model with VARMA errors. J. Multivariate Anal. 93 122-163. · Zbl 1087.62098 · doi:10.1016/j.jmva.2004.01.005
[47] Hallin, M. and Paindaveine, D. (2006a). Semiparametrically efficient rank-based inference for shape. I. Optimal rank-based tests for sphericity. Ann. Statist. 34 2707-2756. · Zbl 1114.62066 · doi:10.1214/009053606000000731
[48] Hallin, M. and Paindaveine, D. (2006b). Parametric and semiparametric inference for shape: The role of the scale functional. Statist. Decisions 24 327-350. · Zbl 1111.62002 · doi:10.1524/stnd.2006.24.3.327
[49] Hallin, M. and Paindaveine, D. (2008). Optimal rank-based tests for homogeneity of scatter. Ann. Statist. 36 1261-1298. · Zbl 1360.62288 · doi:10.1214/07-AOS508
[50] Hallin, M., Paindaveine, D. and Šiman, M. (2010). Multivariate quantiles and multiple-output regression quantiles: From \[{L_1}\] optimization to halfspace depth. Ann. Statist. 38 635-669. · Zbl 1183.62088 · doi:10.1214/09-AOS723
[51] Hallin, M., Paindaveine, D. and Verdebout, T. (2010). Optimal rank-based testing for principal components. Ann. Statist. 38 3245-3299. · Zbl 1373.62295 · doi:10.1214/10-AOS810
[52] Hallin, M., Paindaveine, D. and Verdebout, T. (2013). Optimal rank-based tests for common principal components. Bernoulli 19 2524-2556. · Zbl 1457.62182 · doi:10.3150/12-BEJ461
[53] Hallin, M., Paindaveine, D. and Verdebout, T. (2014). Efficient R-estimation of principal and common principal components. J. Amer. Statist. Assoc. 109 1071-1083. · Zbl 1368.62160 · doi:10.1080/01621459.2014.880057
[54] Hallin, M. and Werker, B. J. M. (1999). Optimal testing for semi-parametric AR models—From Gaussian Lagrange multipliers to autoregression rank scores and adaptive tests. In Asymptotics, Nonparametrics, and Time Series (S. Ghosh, ed.). Statist. Textbooks Monogr. 158 295-350. Dekker, New York. · Zbl 1069.62541
[55] Hallin, M. and Werker, B. J. M. (2003). Semi-parametric efficiency, distribution-freeness and invariance. Bernoulli 9 137-165. · Zbl 1020.62042 · doi:10.3150/bj/1068129013
[56] Hallin, M., del Barrio, E., Cuesta-Albertos, J. A. and Matrán, C. (2021). Supplement to “Distribution and quantile functions, ranks and signs in dimension \(d\): A measure transportation approach.” https://doi.org/10.1214/20-AOS1996SUPP
[57] Hamel, A. H. and Kostner, D. (2018). Cone distribution functions and quantiles for multivariate random variables. J. Multivariate Anal. 167 97-113. · Zbl 1490.62123 · doi:10.1016/j.jmva.2018.04.004
[58] He, X. and Wang, G. (1997). Convergence of depth contours for multivariate datasets. Ann. Statist. 25 495-504. · Zbl 0873.62053 · doi:10.1214/aos/1031833661
[59] Hodges, J. L. Jr. (1955). A bivariate sign test. Ann. Math. Stat. 26 523-527. · Zbl 0065.12401 · doi:10.1214/aoms/1177728498
[60] Ilmonen, P. and Paindaveine, D. (2011). Semiparametrically efficient inference based on signed ranks in symmetric independent component models. Ann. Statist. 39 2448-2476. · Zbl 1231.62043 · doi:10.1214/11-AOS906
[61] Judd, K. L. (1998). Numerical Methods in Economics. MIT Press, Cambridge, MA. · Zbl 0924.65001
[62] Karp, R. M. (1978). A characterization of the minimum cycle mean in a digraph. Discrete Math. 23 309-311. · Zbl 0386.05032 · doi:10.1016/0012-365X(78)90011-0
[63] Koltchinskii, V. I. (1997). \(M\)-estimation, convexity and quantiles. Ann. Statist. 25 435-477. · Zbl 0878.62037 · doi:10.1214/aos/1031833659
[64] Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer Texts in Statistics. Springer, New York. · Zbl 1076.62018
[65] Lehmann, E. L. and Scholz, F.-W. (1992). Ancillarity. In Current Issues in Statistical Inference: Essays in Honor of D. Basu. Institute of Mathematical Statistics Lecture Notes—Monograph Series 17 32-51. IMS, Hayward, CA. · Zbl 0760.00006 · doi:10.1214/lnms/1215458837
[66] Liu, R. Y. (1992). Data depth and multivariate rank tests. \[In {L_1}\]-Statistical Analysis and Related Methods (Neuchâtel, 1992) 279-294. North-Holland, Amsterdam.
[67] Liu, R. Y. and Singh, K. (1993). A quality index based on data depth and multivariate rank tests. J. Amer. Statist. Assoc. 88 252-260. · Zbl 0772.62031
[68] López-Pintado, S. and Romo, J. (2009). On the concept of depth for functional data. J. Amer. Statist. Assoc. 104 718-734. · Zbl 1388.62139 · doi:10.1198/jasa.2009.0108
[69] Marden, J. I. (1999). Multivariate rank tests. In Multivariate Analysis, Design of Experiments, and Survey Sampling. Statist. Textbooks Monogr. 159 401-432. Dekker, New York. · Zbl 0946.62060
[70] McCann, R. J. (1995). Existence and uniqueness of monotone measure-preserving maps. Duke Math. J. 80 309-323. · Zbl 0873.28009 · doi:10.1215/S0012-7094-95-08013-2
[71] McKeague, I. W., López-Pintado, S., Hallin, M. and Šiman, M. (2011). Analyzing growth trajectories. J. Dev. Orig. Health Dis. 2 322-329.
[72] Möttönen, J. and Oja, H. (1995). Multivariate spatial sign and rank methods. J. Nonparametr. Stat. 5 201-213. · Zbl 0857.62056 · doi:10.1080/10485259508832643
[73] Möttönen, J., Oja, H. and Tienari, J. (1997). On the efficiency of multivariate spatial sign and rank tests. Ann. Statist. 25 542-552. · Zbl 0873.62048 · doi:10.1214/aos/1031833663
[74] Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods. CBMS-NSF Regional Conference Series in Applied Mathematics 63. SIAM, Philadelphia, PA. · doi:10.1137/1.9781611970081
[75] Nordhausen, K., Oja, H. and Paindaveine, D. (2009). Signed-rank tests for location in the symmetric independent component model. J. Multivariate Anal. 100 821-834. · Zbl 1157.62025 · doi:10.1016/j.jmva.2008.08.004
[76] Oja, H. (1999). Affine invariant multivariate sign and rank tests and corresponding estimates: A review. Scand. J. Stat. 26 319-343. · Zbl 0938.62063 · doi:10.1111/1467-9469.00152
[77] Oja, H. (2010). Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs and Ranks. Lecture Notes in Statistics 199. Springer, New York. · Zbl 1269.62036 · doi:10.1007/978-1-4419-0468-3
[78] Oja, H. and Paindaveine, D. (2005). Optimal signed-rank tests based on hyperplanes. J. Statist. Plann. Inference 135 300-323. · Zbl 1162.62353 · doi:10.1016/j.jspi.2004.04.022
[79] Oja, H. and Randles, R. H. (2004). Multivariate nonparametric tests. Statist. Sci. 19 598-605. · Zbl 1100.62567 · doi:10.1214/088342304000000558
[80] Pateiro-López, B. and Rodríguez-Casal, A. (2010). Generalizing the convex hull of a sample: The R package \(α\)-hull. J. Stat. Softw. 34.
[81] Peyré, G. and Cuturi, M. (2019). Computational optimal transport. Found. Trends Mach. Learn. 11 355-607.
[82] Puri, M. L. and Sen, P. K. (1966). On a class of multivariate multisample rank-order tests. Sankhyā Ser. A 28 353-376. · Zbl 0156.39902
[83] Puri, M. L. and Sen, P. K. (1969). A class of rank order tests for a general linear hypothesis. Ann. Math. Stat. 40 1325-1343. · Zbl 0193.16902 · doi:10.1214/aoms/1177697505
[84] Puri, M. L. and Sen, P. K. (1971). Nonparametric Methods in Multivariate Analysis. Wiley, New York. · Zbl 0237.62033
[85] Randles, R. H. (1989). A distribution-free multivariate sign test based on interdirections. J. Amer. Statist. Assoc. 84 1045-1050. · Zbl 0702.62039
[86] Rockafellar, R. T. (1966). Characterization of the subdifferentials of convex functions. Pacific J. Math. 17 497-510. · Zbl 0145.15901
[87] Rockafellar, R. T. and Wets, R. J.-B. (1998). Variational Analysis. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 317. Springer, Berlin. · Zbl 0888.49001 · doi:10.1007/978-3-642-02431-3
[88] Santner, T. J., Williams, B. J. and Notz, W. I. (2003). The Design and Analysis of Computer Experiments. Springer Series in Statistics. Springer, New York. · Zbl 1041.62068 · doi:10.1007/978-1-4757-3799-8
[89] Segers, J., van den Akker, R. and Werker, B. J. M. (2014). Semiparametric Gaussian copula models: Geometry and efficient rank-based estimation. Ann. Statist. 42 1911-1940. · Zbl 1305.62115 · doi:10.1214/14-AOS1244
[90] Sen, P. K. and Puri, M. L. (1967). On the theory of rank order tests for location in the multivariate one sample problem. Ann. Math. Stat. 38 1216-1228. · Zbl 0155.26404 · doi:10.1214/aoms/1177698790
[91] Serfling, R. (2002). Quantile functions for multivariate analysis: Approaches and applications. Stat. Neerl. 56 214-232. · Zbl 1076.62054 · doi:10.1111/1467-9574.00195
[92] Shi, H., Drton, M. and Han, F. (2019). Distribution-free consistent independence tests via Hallin’s multivariate ranks. Available at arXiv:1909.10024.
[93] Um, Y. and Randles, R. H. (1998). Nonparametric tests for the multivariate multi-sample location problem. Statist. Sinica 8 801-812. · Zbl 0905.62048
[94] Yosida, K. (1965). Functional Analysis. Die Grundlehren der Mathematischen Wissenschaften 123. Academic Press, New York; Springer, Berlin.
[95] Zuo, Y. (2018). On general notions of depth for regression. Available at arXiv:1805.02046v1.
[96] Zuo, Y. and He, X. (2006). On the limiting distributions of multivariate depth-based rank sum statistics and related tests. Ann. Statist. 34 2879-2896. · Zbl 1114.62020 · doi:10.1214/009053606000000876
[97] Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function. Ann. Statist. 28 461-482 · Zbl 1106.62334 · doi:10.1214/aos/1016218226
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.