Multivariate dispersion, central regions and depth. The lift zonoid approach.

*(English)*Zbl 1027.62033
Lecture Notes in Statistics. 165. New York, NY: Springer. 291 p. EUR 74.95 (net); sFr 124.50; £52.50; $ 69.95 (2002).

The book presents a new characteristic of probability measures, called lift zonoid representation, providing an informative description of a set of multivariate data in a general geometric language. Given a set \( S\) of \(n\) pairs \((a_i,p_i)\), \(i=1,\dots ,n\), where \(a_i\) are \(d\)-dimensional vectors with probability \(p_i\), the lift zonoid \(Z_n\) is defined as the convex hull spanned over all segments starting from the origin to points \((P,S')\in \mathbb R^{(d+1)}\) formed by all subsets \(S'\in S\) with accumulated probabilities \(P=P(S')\). The lift zonoid for a distribution is calculated as a limit of \(Z_n\) as \(n\to\infty\). Theorems are proved stating that every zonoid is convex and bounded, and allows monotone approximation by empirical distribution zonoids. It is proved that the lift zonoids obey the law of large numbers and central limit theorem with convergence in the Hausdorff distance. Also it is proved that the lift zonoid distance is continuous, and its volume is the expectation of some random determinants.

The stochastic order between multivariate probability distributions is defined using the partial order of their lift zonoids. This order proves to be antisymmetric, and is preserved under marginalization, and scale and location transformations. To measure the variables’ dependence, the partial central regions dependence order is defined. This order is transitive, reflexive and antisymmetric.

Next, zonoid \(\alpha\)-trimmed regions and central regions are defined as regions surrounding the set gravity center characterizing the location and general shape of a data set. In the one-dimensional case they reduce to the distance between two symmetric quantiles. These regions are proved to be monotone in \(\alpha\), and to possess the properties of convexity, compactness, symmetry, and affine invariance. Also it is proved that the zonoid trimmed regions obey the law of large numbers. Further, the notion of median is generalized to multivariate central regions.

The zonoid depth is introduced as a data characteristic measuring the distance to the center equal to 1 for the set center, and 0 for all points outside the convex hull. This function is proved to be affine invariant, continuous, and increasing in dilation. Also it is proved that for two empirical distributions of the same combinatorial structure, their lift zonoids have isomorphic face lattices. Zonoid depths can be used in constructing multivariate rank tests, in clustering and in data classification.

The thus defined zonoid depths are considered for well known applications: in two-sample depth scale and location tests, Friedman-Rafsky test, Hotelling \(T^2\)-test, Puri-Sen test, for Wilcoxon distance, and also for power comparisons. Then similarly the depth of hyperplanes is defined and investigated. The volume of a lift zonoid is characterized by the Gini mean difference and the multivariate volume Gini-index, which is the average volume of the lift zonoid corresponding to all \(2^d-1\) marginal distributions.

Some relations between these characteristics are stated. The volume of the lift zonoid is a parameter reflecting the dispersion of a distribution. Theorems are proved that the expected volume of the random lift zonoid equals, up to a constant, the volume of the random vector lift zonoid. The lift zonoid order is defined as a partial stochastic order in \(\mathbb R^d\) invariant under affine transformations. It preserves the same order for all marginal distributions and is invariant under convolutions. A theorem is proved stating that the distance-Gini mean difference and the volume-Gini mean difference are increasing with the convex order, lift zonoid order, and similar orderings.

In the last section, some applications are discussed for measuring the economic disparity and concentration. The multivariate inverse Lorenz function is introduced and investigated. It is proved that the Lorenz price order and the inverse Lorenz function order coincide. Distributions of absolute endowments and their order are discussed and weak price majorization is defined to compare distributions with unequal totals. An order of multivariate concentration functions is defined and investigated, and also multivariate concentration indices are treated using concentration functions such as the Rosenbluth index and concentration rates.

The stochastic order between multivariate probability distributions is defined using the partial order of their lift zonoids. This order proves to be antisymmetric, and is preserved under marginalization, and scale and location transformations. To measure the variables’ dependence, the partial central regions dependence order is defined. This order is transitive, reflexive and antisymmetric.

Next, zonoid \(\alpha\)-trimmed regions and central regions are defined as regions surrounding the set gravity center characterizing the location and general shape of a data set. In the one-dimensional case they reduce to the distance between two symmetric quantiles. These regions are proved to be monotone in \(\alpha\), and to possess the properties of convexity, compactness, symmetry, and affine invariance. Also it is proved that the zonoid trimmed regions obey the law of large numbers. Further, the notion of median is generalized to multivariate central regions.

The zonoid depth is introduced as a data characteristic measuring the distance to the center equal to 1 for the set center, and 0 for all points outside the convex hull. This function is proved to be affine invariant, continuous, and increasing in dilation. Also it is proved that for two empirical distributions of the same combinatorial structure, their lift zonoids have isomorphic face lattices. Zonoid depths can be used in constructing multivariate rank tests, in clustering and in data classification.

The thus defined zonoid depths are considered for well known applications: in two-sample depth scale and location tests, Friedman-Rafsky test, Hotelling \(T^2\)-test, Puri-Sen test, for Wilcoxon distance, and also for power comparisons. Then similarly the depth of hyperplanes is defined and investigated. The volume of a lift zonoid is characterized by the Gini mean difference and the multivariate volume Gini-index, which is the average volume of the lift zonoid corresponding to all \(2^d-1\) marginal distributions.

Some relations between these characteristics are stated. The volume of the lift zonoid is a parameter reflecting the dispersion of a distribution. Theorems are proved that the expected volume of the random lift zonoid equals, up to a constant, the volume of the random vector lift zonoid. The lift zonoid order is defined as a partial stochastic order in \(\mathbb R^d\) invariant under affine transformations. It preserves the same order for all marginal distributions and is invariant under convolutions. A theorem is proved stating that the distance-Gini mean difference and the volume-Gini mean difference are increasing with the convex order, lift zonoid order, and similar orderings.

In the last section, some applications are discussed for measuring the economic disparity and concentration. The multivariate inverse Lorenz function is introduced and investigated. It is proved that the Lorenz price order and the inverse Lorenz function order coincide. Distributions of absolute endowments and their order are discussed and weak price majorization is defined to compare distributions with unequal totals. An order of multivariate concentration functions is defined and investigated, and also multivariate concentration indices are treated using concentration functions such as the Rosenbluth index and concentration rates.

Reviewer: Vadim I.Serdobol’skij (Moskva)

##### MSC:

62H05 | Characterization and structure theory for multivariate probability distributions; copulas |

62-02 | Research exposition (monographs, survey articles) pertaining to statistics |

60E15 | Inequalities; stochastic orderings |

60A99 | Foundations of probability theory |

62E10 | Characterization and structure theory of statistical distributions |