## Invariant co-ordinate selection (with discussion).(English)Zbl 1250.62032

Summary: A general method for exploring multivariate data by comparing different estimates of multivariate scatter is presented. The method is based on the eigenvalue-eigenvector decomposition of one scatter matrix relative to another. In particular, it is shown that the eigenvectors can be used to generate an affine invariant co-ordinate system for the multivariate data. Consequently, we view this method as a method for invariant co-ordinate selection. By plotting the data with respect to this new invariant co-ordinate system, various data structures can be revealed. For example, under certain independent components models, it is shown that the invariant coordinates correspond to the independent components. Another example pertains to mixtures of elliptical distributions. In this case, it is shown that a subset of the invariant coordinates corresponds to Fisher’s linear discriminant subspace, even though the class identifications of the data points are unknown. Some illustrative examples are given.

### MSC:

 62H30 Classification and discrimination; cluster analysis (statistical aspects) 62H12 Estimation in multivariate analysis 62H25 Factor analysis and principal components; correspondence analysis 15A18 Eigenvalues, singular values, and eigenvectors

R; ICS
Full Text:

### References:

 [1] Art, Data-based metrics for cluster analysis, Util. Math. A 21 pp 75– (1982) · Zbl 0501.62050 [2] Bilodeau, Theory of Multivariate Statistics (1999) [3] Cardoso, Proc. Int. Conf. Acoustics, Speech and Signal Processing pp 2109– (1989) [4] Caussinus, A monitoring display of multivariate outliers, Computnl Statist. Data Anal. 44 pp 237– (2003) [5] Caussinus, Proc. COMPSTAT 90 pp 121– (1990) [6] Caussinus, New Directions in Statistical Data Analysis and Robustness pp 35– (1993) [7] Caussinus, Data Science and Its Applications pp 177– (1995) [8] Chakraborty, On a transformation and retransformation technique for constructing affine equivariant multivariate median, Proc. Am. Math. Soc. 124 pp 2539– (1996) · Zbl 0856.62046 [9] Chakraborty, On an adaptive transformation-retransformation estimate of multivariate location, J. R. Statist. Soc. B 60 pp 145– (1998) · Zbl 0909.62056 [10] Cook, Projection pursuit indexes based on orthonormal function expansions, J. Computnl Graph. Statist. 2 pp 225– (1993) [11] Critchley (2007) [12] Davies, Asymptotic behavior of S-estimates of multivariate location parameters and dispersion matrices, Ann. Statist. 15 pp 1269– (1987) · Zbl 0645.62057 [13] Donoho, Breakdown properties of location estimates based on halfspace depth and projected outlyingness, Ann. Statist. 20 pp 1803– (1992) · Zbl 0776.62031 [14] Dümbgen, On the breakdown properties of some multivariate M-functionals, Scand. J. Statist. 32 pp 247– (2005) [15] Flury, Common Principal Components and Related Multivariate Models (1988) · Zbl 1081.62535 [16] Friedman, A projection pursuit algorithm for exploratory data analysis, IEEE Trans. Comput. 23 pp 881– (1974) · Zbl 0284.68079 [17] Hampel, Robust Statistics: the Approach Based on Influence Functions (1986) · Zbl 0593.62027 [18] Huber, Robust Statistics (1981) [19] Huber, Projection pursuit, Ann. Statist. 13 pp 435– (1985) [20] Hyvärinen, Independent Component Analysis (2001) [21] Jones, What is projection pursuit (with discussion)?, J. R. Statist. Soc. A 150 pp 1– (1987) · Zbl 0632.62059 [22] Kent, Constrained M-estimation for multivariate location and scatter, Ann. Statist. 24 pp 1346– (1996) · Zbl 0862.62048 [23] Lopuhaä, On the relation between S-estimators and M-estimators of multivariate location and covariance, Ann. Statist. 17 pp 1662– (1989) · Zbl 0702.62031 [24] Lopuhaä, Multivariate {$$\tau$$}-estimators of location and scatter, Can. J. Statist. 19 pp 307– (1991) [25] Lopuhaä, Asymptotics of reweighted estimators of multivariate location and scatter, Ann. Statist. 27 pp 1638– (1999) · Zbl 0957.62017 [26] Mardia, Multivariate Analysis (1980) [27] Maronna, Robust M-estimators of multivariate location and scatter, Ann. Statist. 4 pp 51– (1976) · Zbl 0322.62054 [28] Maronna, Bias-robust estimators of multivariate scatter based on projections, J. Multiv. Anal. 42 pp 141– (1992) · Zbl 0777.62057 [29] Mosteller, Data Analysis and Regression (1977) [30] Nordhausen, Robust independent component analysis based on two scatter matrices, Aust. J. Statist. 37 pp 91– (2008) [31] Nordhausen, Festschrift for Tarmo Pukkila pp 217– (2006) [32] Nordhausen, J. Statist. Softwr. 28 (2008) [33] Oja, Scatter matrices and independent component analysis, Aust. J. Statist. 35 pp 175– (2006) [34] Peña, Cluster identification using projections, J. Am. Statist. Ass. 96 pp 1433– (2001) · Zbl 1051.62055 [35] R Development Core Team, R: a Language and Environment for Statistical Computing (2005) [36] Rousseeuw, Mathematical Statistics and Applications pp 283– (1986) [37] Rousseeuw, A fast algorithm for the minimum covariance determinant estimator, Technometrics 41 pp 212– (1999) [38] Rousseeuw, Robust Regression and Outlier Detection (1987) [39] Ruiz-Gazen, Estimation robuste d’une matrice de dispersion et projections révélatrices (1993) [40] Taskinen, Independent component analysis based on symmetrised scatter matrices, Computnl Statist. Data Anal. 51 pp 5103– (2007) [41] Tatsuoka, On the uniqueness of S-functionals and M-functionals under nonelliptical distributions, Ann. Statist. 28 pp 1219– (2000) · Zbl 1105.62347 [42] Tyler, Finite sample breakdown points of projection based multivariate location and scatter statistics, Ann. Statist. 22 pp 1024– (1994) · Zbl 0815.62015 [43] Tyler, High breakdown point multivariate M-estimation, Estadística 54 pp 213– (2002) [44] Visuri, Sign and rank covariance matrices, J. Statist. Planng Inf. 91 pp 557– (2000) · Zbl 0965.62049 [45] Yenyukov, Proc. COMPSTAT 88 pp 47– (1988) [46] Art, Data-based metrics for cluster analysis, Util. Math. A 21 pp 75– (1982) · Zbl 0501.62050 [47] Bugrien, Proceedings in Quantitative Biology, Shape Analysis and Wavelets pp 111– (2005) [48] Cardoso, Jacobi angles for simultaneous diagonalization, SIAM J. Math. Anal. Appl. 17 pp 161– (1996) · Zbl 0844.65028 [49] Caussinus, A monitoring display of multivariate outliers, Computnl Statist. Data Anal. 44 pp 237– (2003) [50] Caussinus, Projections révélatrices contrôlées, groupements et structures diverses, Rev. Statist. Appl. 51 pp 37– (2003) [51] Caussinus, Selected Contributions in Data Analysis and Classification pp 539– (2007) [52] Critchley (2007) [53] Eaton, Multivariate Statistics: a Vector Space Approach (1983) [54] Filzmoser, Outlier identification in high dimensions, Computnl Statist. Data Anal. 52 pp 1694– (2008) [55] Genton, Skew-elliptical Distributions and Their Applications: a Journey Beyond Normality (2004) · Zbl 1069.62045 [56] Hallin, Semiparametrically efficient rank-based inference for shape: II, optimal R-estimation of shape, Ann. Statist. 34 pp 2757– (2006) · Zbl 1115.62059 [57] Hallin, Semiparametrically efficient rank-based inference for shape: I, optimal rank-based tests for sphericity, Ann. Statist. 34 pp 2707– (2006) · Zbl 1114.62066 [58] Hampel, Robust Statistics: the Approach based on Influence Functions (1986) · Zbl 0593.62027 [59] John, The distribution of a statistic used for testing sphericity of normal distributions, Biometrika 59 pp 169– (1972) · Zbl 0231.62072 [60] Kankainen, Tests of multinormality based on location vectors and scatter matrices, Statist. Meth. Appl. 16 pp 357– (2007) · Zbl 1405.62062 [61] Mauchly, Test for sphericity of a normal n-variate distribution, Ann. Math. Statist. 11 pp 204– (1940) · JFM 66.0641.04 [62] Nordhausen, Festschrift for Thomas P. Hettmansperger (2009) [63] Nordhausen, Signed-rank tests for location in the symmetric independent component model, J. Multiv. Anal. 100 pp 821– (2009) · Zbl 1157.62025 [64] Nordhausen, R Package, Version 1.1-1 (2008) [65] Peña, Cluster identification using projections, J. Am. Statist. Ass. 96 pp 1433– (2001) · Zbl 1051.62055 [66] Peña, Robust covariance matrix estimation and multivariate outlier detection (with discussion), Technometrics 43 pp 286– (2001) [67] Peña, D. , Prieto, F. J. and Viladomat, J. (2008) Eigenvectors of a kurtosis matrix as interesting directions to reveal cluster structure. To be published. · Zbl 1203.62114 [68] Preston, A graphical method for the analysis of statistical distributions into two normal components, Biometrika 40 pp 460– (1953) · Zbl 0051.10811 [69] Stone (2008) [70] Sun, Significance levels in exploratory projection pursuit, Biometrika 78 pp 759– (1991) · Zbl 0753.62067 [71] Tyler, Radial estimates and the test for sphericity, Biometrika 69 pp 429– (1982) · Zbl 0501.62041 [72] Tyler, A distribution-free M-estimator of multivariate scatter, Ann. Statist. 15 pp 234– (1987) · Zbl 0628.62053 [73] Tyler (2009) [74] Wang (2008)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.