# zbMATH — the first resource for mathematics

Efficient estimation of linear functionals of principal components. (English) Zbl 1440.62232
In the setting of principal component analysis for $$n$$ IID, mean zero Gaussian observations in a separable Hilbert space, the authors consider the estimation problem for linear functionals of eigenvalues of the unknown covariance operator $$\Sigma$$. The effective rank $$r(\Sigma)=\mbox{tr}(\Sigma)/\|\Sigma\|$$ is used to quantify the complexity of the problem, where $$\mbox{tr}(\Sigma)$$ is the trace of $$\Sigma$$ and $$\|\Sigma\|$$ is its operator norm. No assumptions on the structure of $$\Sigma$$ are made, though eigenvalues to be estimated are assumed to be simple (i.e., have multiplicity 1). It is known that naive estimators can suffer from substantial bias when this effective rank is large with respect to $$n$$. For the case where $$r(\Sigma)=o(n)$$, the authors propose a bias reduction technique and show asymptotic normality of their estimator. Their upper bounds are complemented by lower bounds that demonstrate semiparametric optimality of their estimator in this case.

##### MSC:
 62H25 Factor analysis and principal components; correspondence analysis 62E17 Approximations to statistical distributions (nonasymptotic) 62E20 Asymptotic distribution theory in statistics 60F05 Central limit and other weak theorems
fda (R)
Full Text:
##### References:
  Anderson, T. W. (1963). Asymptotic theory for principal component analysis. Ann. Math. Stat. 34 122-148. · Zbl 0202.49504  Benaych-Georges, F. and Nadakuditi, R. R. (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv. Math. 227 494-521. · Zbl 1226.15023  Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780-1815. · Zbl 1277.62155  Blanchard, G., Bousquet, O. and Zwald, L. (2007). Statistical properties of kernel principal component analysis. Mach. Learn. 66 259-294. · Zbl 1078.68133  Bloemendal, A., Knowles, A., Yau, H.-T. and Yin, J. (2016). On the principal components of sample covariance matrices. Probab. Theory Related Fields 164 459-552. · Zbl 1339.15023  Cai, T. T. and Guo, Z. (2017). Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity. Ann. Statist. 45 615-646. · Zbl 1371.62045  Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074-3110. · Zbl 1288.62099  Dauxois, J., Pousse, A. and Romain, Y. (1982). Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. J. Multivariate Anal. 12 136-154. · Zbl 0539.62064  Eaton, M. L. (1983). Multivariate Statistics: A Vector Space Approach. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. Wiley, New York. · Zbl 0587.62097  Fan, J., Rigollet, P. and Wang, W. (2015). Estimation of functionals of sparse covariance matrices. Ann. Statist. 43 2706-2737. · Zbl 1327.62338  Gao, C. and Zhou, H. H. (2015). Rate-optimal posterior contraction for sparse PCA. Ann. Statist. 43 785-818. · Zbl 1312.62078  Gao, C. and Zhou, H. H. (2016). Bernstein-von Mises theorems for functionals of the covariance matrix. Electron. J. Stat. 10 1751-1806. · Zbl 1346.62059  Gill, R. D. and Levit, B. Y. (1995). Applications of the Van Trees inequality: A Bayesian Cramér-Rao bound. Bernoulli 1 59-79. · Zbl 0830.62035  Giné, E. and Nickl, R. (2016). Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge Univ. Press, New York. · Zbl 1358.62014  Janková, J. and van de Geer, S. (2018). Semiparametric efficiency bounds for high-dimensional models. Ann. Statist. 46 2336-2359. · Zbl 1420.62308  Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 15 2869-2909. · Zbl 1319.62145  Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295-327. · Zbl 1016.62078  Koltchinskii, V. (2017). Asymptotically efficient estimation of smooth functionals of covariance operators. Available at https://arxiv.org/abs/1710.09072.  Koltchinskii, V., Löffler, M. and Nickl, R. (2019). Supplement to “Efficient estimation of linear functionals of principal components.” https://doi.org/10.1214/19-AOS1816SUPP.  Koltchinskii, V. and Lounici, K. (2016). Asymptotics and concentration bounds for bilinear forms of spectral projectors of sample covariance. Ann. Inst. Henri Poincaré Probab. Stat. 52 1976-2013. · Zbl 1353.62053  Koltchinskii, V. and Lounici, K. (2017). Concentration inequalities and moment bounds for sample covariance operators. Bernoulli 23 110-133. · Zbl 1366.60057  Koltchinskii, V. and Lounici, K. (2017). Normal approximation and concentration of spectral projectors of sample covariance. Ann. Statist. 45 121-157. · Zbl 1367.62175  Koltchinskii, V. and Lounici, K. (2017). New asymptotic results in principal component analysis. Sankhya A 79 254-297. · Zbl 06822893  Lila, E., Aston, J. A. D. and Sangalli, L. M. (2016). Smooth principal component analysis over two-dimensional manifolds with an application to neuroimaging. Ann. Appl. Stat. 10 1854-1879. · Zbl 1454.62187  Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791-2817. · Zbl 1168.62058  Naumov, A., Spokoiny, V. and Ulyanov, V. (2018). Confidence sets for spectral projectors of covariance matrices. Dokl. Math. 98 511-514. · Zbl 1409.62148  Ning, Y. and Liu, H. (2017). A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann. Statist. 45 158-195. · Zbl 1364.62128  Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617-1642. · Zbl 1134.62029  Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. Springer Series in Statistics. Springer, New York. · Zbl 1079.62006  Reiss, M. and Wahl, M. (2016). Non-asymptotic upper bounds for the reconstruction error of PCA. Available at arXiv:1609.03779.  Ren, Z., Sun, T., Zhang, C.-H. and Zhou, H. H. (2015). Asymptotic normality and optimalities in estimation of large Gaussian graphical models. Ann. Statist. 43 991-1026. · Zbl 1328.62342  van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 1166-1202. · Zbl 1305.62259  van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.  van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer Series in Statistics. Springer, New York. · Zbl 0862.60002  Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210-268. Cambridge Univ. Press, Cambridge.  Vu, V. Q. and Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions. Ann. Statist. 41 2905-2947. · Zbl 1288.62103  Wang, T., Berthet, Q. and Samworth, R. J. (2016). Statistical and computational trade-offs in estimation of sparse principal components. Ann. Statist. 44 1896-1930. · Zbl 1349.62254  Wang, W. and Fan, J. (2017). Asymptotics of empirical eigenstructure for high dimensional spiked covariance. Ann. Statist. 45 1342-1374. · Zbl 1373.62299  Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217-242. · Zbl 1411.62196
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.