# zbMATH — the first resource for mathematics

Testing for principal component directions under weak identifiability. (English) Zbl 1439.62075
Given data from a multivariate normal distribution whose covariance matrix has eigenvalues $$\lambda_1\geq\lambda_2\geq\cdots\geq\lambda_p$$, consider testing the null hypothesis $$\mbox{H}_0:\boldsymbol{\theta}_1=\boldsymbol{\theta}_1^0$$ against the alternative hypothesis $$\mbox{H}_1:\boldsymbol{\theta}_1\not=\boldsymbol{\theta}_1^0$$, where $$\boldsymbol{\theta}_1$$ is the eigenvector associated with $$\lambda_1$$ and $$\boldsymbol{\theta}_1^0$$ is a fixed vector. The authors compare two tests in this setting: a classical likelihood ratio test and the Le Cam optimal test due to M. Hallin et al. [Ann. Stat. 38, No. 6, 3245–3299 (2010; Zbl 1373.62295)]. When the eigenvalues $$\lambda_i$$ are fixed, these two tests are known to be asymptotically equivalent under the null hypothesis and sequences of contiguous alternatives. In this paper, the authors show that this asymptotic equivalence breaks down in the setting where the eigenvalues may depend on the sample size $$n$$ and $$\lambda_1/\lambda_2=1+O(r_n)$$, with $$r_n=O(1/\sqrt{n})$$. In this setting, the likelihood ratio test is shown to over-reject the null hypothesis, so that the Le Cam optimal test is preferable here. Further properties of this latter test are investigated to show that this gain over the likelihood ratio test does not come at the expense of power. The more general setting of elliptical data is also considered, and numerical examples (based on both simulations and real data) are presented to illustrate the findings of the paper.

##### MSC:
 62F05 Asymptotic properties of parametric tests 62F03 Parametric hypothesis testing 62H25 Factor analysis and principal components; correspondence analysis 62E20 Asymptotic distribution theory in statistics
##### Software:
ROBPCA; TCLUST; uskewFactors
Full Text:
##### References:
 [1] Anderson, T. W. (1963). Asymptotic theory for principal component analysis. Ann. Math. Stat. 34 122-148. · Zbl 0202.49504 [2] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ. · Zbl 1039.62044 [3] Atkinson, A. C., Riani, M. and Cerioli, A. (2004). Exploring Multivariate Data with the Forward Search. Springer Series in Statistics. Springer, New York. · Zbl 1049.62057 [4] Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643-1697. · Zbl 1086.15022 [5] Bali, J. L., Boente, G., Tyler, D. E. and Wang, J.-L. (2011). Robust functional principal components: A projection-pursuit approach. Ann. Statist. 39 2852-2882. · Zbl 1246.62145 [6] Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780-1815. · Zbl 1277.62155 [7] Boente, G. and Fraiman, R. (2000). Kernel-based functional principal components. Statist. Probab. Lett. 48 335-345. · Zbl 0997.62024 [8] Burman, P. and Polonik, W. (2009). Multivariate mode hunting: Data analytic tools with measures of significance. J. Multivariate Anal. 100 1198-1218. · Zbl 1159.62032 [9] Croux, C. and Haesbroeck, G. (2000). Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies. Biometrika 87 603-618. · Zbl 0956.62047 [10] Cuevas, A. (2014). A partial overview of the theory of statistics with functional data. J. Statist. Plann. Inference 147 1-23. · Zbl 1278.62012 [11] Dufour, J.-M. (1997). Some impossibility theorems in econometrics with applications to structural and dynamic models. Econometrica 65 1365-1387. · Zbl 0886.62116 [12] Dufour, J.-M. (2006). Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics. J. Econometrics 133 443-477. · Zbl 1345.62037 [13] Flury, B. (1988). Common Principal Components and Related Multivariate Models. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley, New York. · Zbl 1081.62535 [14] Flury, B. and Riedwyl, H. (1988). Multivariate Statistics: A Practical Approach. CRC Press, London. · Zbl 0495.62057 [15] Forchini, G. and Hillier, G. (2003). Conditional inference for possibly unidentified structural equations. Econometric Theory 19 707-743. · Zbl 1441.62688 [16] Fritz, H., García-Escudero, L. A. and Mayo-Iscar, A. (2012). tclust: An R package for a trimming approach to cluster analysis. J. Stat. Softw. 47. [17] Girolami, M. (1999). Self-Organizing Neural Networks. Independent Component Analysis and Blind Source Separation. Springer, London. [18] Hallin, M. and Paindaveine, D. (2006). Semiparametrically efficient rank-based inference for shape. I. Optimal rank-based tests for sphericity. Ann. Statist. 34 2707-2756. · Zbl 1114.62066 [19] Hallin, M., Paindaveine, D. and Verdebout, T. (2010). Optimal rank-based testing for principal components. Ann. Statist. 38 3245-3299. · Zbl 1373.62295 [20] Hallin, M., Paindaveine, D. and Verdebout, T. (2014). Efficient R-estimation of principal and common principal components. J. Amer. Statist. Assoc. 109 1071-1083. · Zbl 1368.62160 [21] Han, F. and Liu, H. (2014). Scale-invariant sparse PCA on high-dimensional meta-elliptical data. J. Amer. Statist. Assoc. 109 275-287. · Zbl 1367.62185 [22] Härdle, W. and Simar, L. (2007). Applied Multivariate Statistical Analysis, 2nd ed. Springer, Berlin. · Zbl 1115.62057 [23] He, R., Hu, B.-G., Zheng, W.-S. and Kong, X.-W. (2011). Robust principal component analysis based on maximum correntropy criterion. IEEE Trans. Image Process. 20 1485-1494. · Zbl 1372.94369 [24] Hubert, M., Rousseeuw, P. J. and Vanden Branden, K. (2005). ROBPCA: A new approach to robust principal component analysis. Technometrics 47 64-79. [25] Jackson, J. E. (2005). A User’s Guide to Principal Components. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ. · Zbl 0743.62047 [26] Jeganathan, P. (1995). Some aspects of asymptotic theory with applications to time series models. Econometric Theory 11 818-887. [27] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682-693. · Zbl 1388.62174 [28] Jolicoeur, P. (1984). Principal components, factor analysis, and multivariate allometry: A small-sample direction test. Biometrics 40 685-690. [29] Koch, I. (2014). Analysis of Multivariate and High-Dimensional Data. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge Univ. Press, New York. · Zbl 1307.62003 [30] Magnus, J. R. and Neudecker, H. (2007). Matrix Differential Calculus with Applications in Statistics and Econometrics, 3rd ed. Wiley, Chichester. · Zbl 0651.15001 [31] Murray, P. M., Browne, R. P. and McNicholas, P. D. (2016). uskewFactors: Model-based clustering via mixtures of unrestricted skew-t sactor analyzer models. R package. Available at https://cran.r-project.org/web/packages/uskewFactors/index.html. [32] Paindaveine, D., Remy, J. and Verdebout, T. (2019). Supplement to “Testing for principal component directions under weak identifiability.” https://doi.org/10.1214/18-AOS1805SUPP. [33] Paindaveine, D. and Verdebout, T. (2017). Inference on the mode of weak directional signals: A Le Cam perspective on hypothesis testing near singularities. Ann. Statist. 45 800-832. · Zbl 1371.62043 [34] Pötscher, B. M. (2002). Lower risk bounds and properties of confidence sets for ill-posed estimation problems with applications to spectral density and persistence estimation, unit roots, and estimation of long memory parameters. Econometrica 70 1035-1065. · Zbl 1121.62559 [35] Roussas, G. G. and Bhattacharya, D. (2011). Revisiting local asymptotic normality (LAN) and passing on to local asymptotic mixed normality (LAMN) and local asymptotic quadratic (LAQ) experiments. In Advances in Directional and Linear Statistics (M. T. Wells and A. Sengupta, eds.) 253-280. Physica-Verlag/Springer, Heidelberg. [36] Salibián-Barrera, M., Van Aelst, S. and Willems, G. (2006). Principal components analysis based on multivariate MM estimators with fast and robust bootstrap. J. Amer. Statist. Assoc. 101 1198-1211. · Zbl 1120.62319 [37] Schwartzman, A., Mascarenhas, W. F. and Taylor, J. E. (2008). Inference for eigenvalues and eigenvectors of Gaussian symmetric matrices. Ann. Statist. 36 2886-2919. · Zbl 1196.62067 [38] Shinmura, S. (2016). New Theory of Discriminant Analysis After R. Fisher. Springer, Singapore. · Zbl 1362.62008 [39] Sylvester, A. D., Kramer, P. A. and Jungers, W. L. (2008). Modern humans are not (quite) isometric. Amer. J. Phys. Anthropol. 137 371-383. [40] Tyler, D. E. (1981). Asymptotic inference for eigenvectors. Ann. Statist. 9 725-736. · Zbl 0474.62051 [41] Tyler, D. E. (1983). A class of asymptotic tests for principal component vectors. Ann. Statist. 11 1243-1250. · Zbl 0544.62053 [42] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.