×

Sparse functional principal component analysis in high dimensions. (English) Zbl 07601223

Summary: Existing functional principal component analysis (FPCA) methods are restricted to data with a single or finite number of random functions (much smaller than the sample size \(n)\). In this work, we focus on high-dimensional functional processes where the number of random functions \(p\) is comparable to, or even much larger than \(n\). Such data are ubiquitous in various fields, such as neuroimaging analysis, and cannot be modeled properly by existing methods. We propose a new algorithm, called sparse FPCA, that models principal eigenfunctions effectively under sensible sparsity regimes. The sparsity structure motivates a thresholding rule that is easy to compute by exploiting the relationship between univariate orthonormal basis expansions and the multivariate Karhunen-Loève representation. We investigate the theoretical properties of the resulting estimators, and illustrate the performance using simulated and real-data examples.

MSC:

62-XX Statistics

Software:

fda (R)
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Balakrishnan, A. (1960). Estimation and detection theory for multiple stochastic processes. Journal of Mathematical Analysis and Applications 1, 386-410. · Zbl 0211.51501
[2] Barachant, A., Bonnet, S., Congedo, M. and Jutten, C. (2011). Multiclass brain-computer interface classification by riemannian geometry. IEEE Transactions on Biomedical Engi-neering 59, 920-928.
[3] Berrendero, J. R., Justel, A. and Svarc, M. (2011). Principal components for multivariate func-tional data. Computational Statistics & Data Analysis 55, 2619-2634. · Zbl 1464.62025
[4] Chiou, J.-M., Chen, Y.-T. and Yang, Y.-F. (2014). Multivariate functional principal component analysis: A normalization approach. Statistica Sinica 24, 1571-1596. · Zbl 1480.62115
[5] Dai, X., Müller, H.-G. and Yao, F. (2017). Optimal Bayes classifiers for functional data and density ratios. Biometrika 104, 545-560. · Zbl 07072227
[6] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425-455. · Zbl 0815.62019
[7] Fan, J., Feng, Y. and Tong, X. (2012). A road to classification in high dimensional space: The regularized optimal affine discriminant. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 74, 745-771. · Zbl 1411.62167
[8] Hall, P. and Horowitz, J. L. (2007). Methodology and convergence rates for functional linear regression. The Annals of Statistics 35, 70-91. · Zbl 1114.62048
[9] Hall, P. and Hosseini-Nasab, M. (2006). On properties of functional principal components analy-sis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68, 109-126. · Zbl 1141.62048
[10] Happ, C. and Greven, S. (2018). Multivariate functional principal component analysis for data observed on different (dimensional) domains. Journal of the American Statistical Associa-tion 113, 649-659. · Zbl 1398.62154
[11] Hayden, E. P., Wiegand, R. E., Meyer, E. T., Bauer, L. O., O’Connor, S. J., Nurnberger Jr, J. I. et al. (2006). Patterns of regional brain activity in alcohol-dependent subjects. Alcoholism: Clinical and Experimental Research 30, 1986-1991.
[12] Horváth, L. and Kokoszka, P. (2012). Inference for Functional Data with Applications. Springer-Verlag, New York. · Zbl 1279.62017
[13] Ingber, L. (1997). Statistical mechanics of neocortical interactions: Canonical momenta indica-torsof electroencephalography. Physical Review E 55, 4578-4593.
[14] Jacques, J. and Preda, C. (2014). Model-based clustering for multivariate functional data. Com-putational Statistics & Data Analysis 71, 92-106. · Zbl 1471.62096
[15] James, G. M., Hastie, T. J. and Sugar, C. A. (2000). Principal component models for sparse functional data. Biometrika 87, 587-602. · Zbl 0962.62056
[16] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association 104, 682-693. · Zbl 1388.62174
[17] Kelly, E. J. and Root, W. L. (1960). A representation of vector-valued random processes. Journal of Mathematics and Physics 39, 211-216. · Zbl 0094.12204
[18] Kong, D., Xue, K., Yao, F. and Zhang, H. H. (2016). Partially functional linear regression in high dimensions. Biometrika 103, 147-159. · Zbl 1452.62500
[19] Koudstaal, M. and Yao, F. (2018). From multiple Gaussian sequences to functional data and beyond: A stein estimation approach. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 80, 319-342. · Zbl 1383.62116
[20] Müller, H.-G. and Stadtmüller, U. (2005). Generalized functional linear models. The Annals of Statistics 33, 774-805. · Zbl 1068.62048
[21] Nguyen, C. H. and Artemiadis, P. (2018). EEG feature descriptors and discriminant analysis under riemannian manifold perspective. Neurocomputing 275, 1871-1883.
[22] Qiao, X., Guo, S. and James, G. M. (2019). Functional graphical models. Journal of the Amer-ican Statistical Association 114, 211-222. · Zbl 1478.62123
[23] Qiao, X., Qian, C., James, G. M. and Guo, S. (2020). Doubly functional graphical models in high dimensions. Biometrika 107, 415-431. · Zbl 1441.62155
[24] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. 2nd Edition. Springer, New York. · Zbl 1079.62006
[25] Rice, J. A. and Silverman, B. W. (1991). Estimating the mean and covariance structure non-parametrically when the data are curves. Journal of the Royal Statistical Society, Series B (Methodological) 53, 233-243. · Zbl 0800.62214
[26] Rice, J. A. and Wu, C. O. (2001). Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57, 253-259. · Zbl 1209.62061
[27] Sabbagh, D., Ablin, P., Varoquaux, G., Gramfort, A. and Engemann, D. A. (2019). Manifold-regression to predict from MEG/EEG brain signals without source modeling. In Advances in Neural Information Processing Systems, 7323-7334. Vancouver, Canada.
[28] Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99, 1015-1034. · Zbl 1141.62049
[29] Solea, E. and Li, B. (2020). Copula Gaussian graphical models for functional data. Journal of the American Statistical Association, 1-13.
[30] Sun, S. and Zhou, J. (2014). A review of adaptive feature extraction and classification methods for eeg-based brain-computer interfaces. In 2014 International Joint Conference on Neural Networks (IJCNN), 1746-1753. IEEE, Beijing.
[31] Vu, V. Q. and Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions. The Annals of Statistics 41, 2905-2947. · Zbl 1288.62103
[32] Wong, R. K., Li, Y. and Zhu, Z. (2019). Partially linear functional additive models for multi-variate functional data. Journal of the American Statistical Association 114, 406-418. · Zbl 1478.62125
[33] Xue, K. and Yao, F. (2021). Hypothesis testing in large-scale functional linear regression. Sta-tistica Sinica 31, 1101-1123. · Zbl 1470.62108
[34] Yao, F., Müller, H.-G. and Wang, J.-L. (2005a). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 100, 577-590. · Zbl 1117.62451
[35] Yao, F., Müller, H.-G. and Wang, J.-L. (2005b). Functional linear regression analysis for longi-tudinal data. The Annals of Statistics 33, 2873-2903. · Zbl 1084.62096
[36] Zhang, X. L., Begleiter, H., Porjesz, B., Wang, W. and Litke, A. (1995). Event related potentials during object recognition tasks. Brain Research Bulletin 38, 531-538.
[37] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics 15, 265-286.
[38] Xiaoyu Hu School of Mathematical Sciences, Center for Statistical Science, Peking University, P.R. China. E-mail: hxyhuxiaoyu@pku.edu.cn
[39] Fang Yao School of Mathematical Sciences, Center for Statistical Science, Peking University, P.R. China. E-mail: fyao@math.pku.edu.cn (Received October 2020; accepted March 2021)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.