##
**Properties of principal component methods for functional and longitudinal data analysis.**
*(English)*
Zbl 1113.62073

Summary: The use of principal component methods to analyze functional data is appropriate in a wide range of different settings. In studies of “functional data analysis” it has often been assumed that a sample of random functions is observed precisely, in the continuum and without noise. While this has been the traditional setting for functional data analysis, in the context of longitudinal data analysis a random function typically represents a patient, or subject, who is observed at only a small number of randomly distributed points, with nonnegligible measurement error. Nevertheless, essentially the same methods can be used in both these cases, as well as in the vast number of settings that lie between them.

How is performance affected by the sampling plan? We answer that question. We show that if there is a sample of \(n\) functions, or subjects, then estimation of eigenvalues is a semiparametric problem, with root-\(n\) consistent estimators, even if only a few observations are made of each function, and if each observation is encumbered by noise. However, estimation of eigenfunctions becomes a nonparametric problem when observations are sparse. The optimal convergence rates in this case are those which pertain to more familiar function-estimation settings. We also describe the effects of sampling at regularly spaced points, as opposed to random points.

In particular, it is shown that there are often advantages in sampling randomly. However, even in the case of noisy data there is a threshold sampling rate (depending on the number of functions treated) above which the rate of sampling (either randomly or regularly) has negligible impact on estimator performance, no matter whether eigenfunctions or eigenvectors are being estimated.

How is performance affected by the sampling plan? We answer that question. We show that if there is a sample of \(n\) functions, or subjects, then estimation of eigenvalues is a semiparametric problem, with root-\(n\) consistent estimators, even if only a few observations are made of each function, and if each observation is encumbered by noise. However, estimation of eigenfunctions becomes a nonparametric problem when observations are sparse. The optimal convergence rates in this case are those which pertain to more familiar function-estimation settings. We also describe the effects of sampling at regularly spaced points, as opposed to random points.

In particular, it is shown that there are often advantages in sampling randomly. However, even in the case of noisy data there is a threshold sampling rate (depending on the number of functions treated) above which the rate of sampling (either randomly or regularly) has negligible impact on estimator performance, no matter whether eigenfunctions or eigenvectors are being estimated.

### MSC:

62H25 | Factor analysis and principal components; correspondence analysis |

62G08 | Nonparametric regression and quantile regression |

62G20 | Asymptotic properties of nonparametric inference |

62M09 | Non-Markovian processes: estimation |

62P10 | Applications of statistics to biology and medical sciences; meta analysis |

### Keywords:

biomedical studies; curse of dimensionality; eigenfunction; eigenvalue; eigenvector; Karhunen-Loève expansion; local polynomial methods; operator theory; optimal convergence rate; principal component analysis; rate of convergence; semiparametric; sparse data; spectral decomposition; smoothing### Software:

fda (R)
PDF
BibTeX
XML
Cite

\textit{P. Hall} et al., Ann. Stat. 34, No. 3, 1493--1517 (2006; Zbl 1113.62073)

### References:

[1] | Besse, P. and Ramsay, J. O. (1986). Principal components-analysis of sampled functions. Psychometrika 51 285–311. · Zbl 0623.62048 |

[2] | Boente, G. and Fraiman, R. (2000). Kernel-based functional principal components. Statist. Probab. Lett. 48 335–345. · Zbl 0997.62024 |

[3] | Bosq, D. (1991). Modelization, nonparametric estimation and prediction for continuous time processes. In Nonparametric Functional Estimation and Related Topics (G. Roussas, ed.) 509–529. Kluwer, Dordrecht. · Zbl 0737.62032 |

[4] | Bosq, D. (2000). Linear Processes in Function Spaces. Theory and Applications . Lecture Notes in Statist. 149 . Springer, New York. · Zbl 0962.60004 |

[5] | Brumback, B. A. and Rice, J. A. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves (with discussion). J. Amer. Statist. Assoc. 93 961–994. JSTOR: · Zbl 1064.62515 |

[6] | Capra, W. B. and Müller, H.-G. (1997). An accelerated-time model for response curves. J. Amer. Statist. Assoc. 92 72–83. JSTOR: · Zbl 0890.62025 |

[7] | Cardot, H. (2000). Nonparametric estimation of smoothed principal components analysis of sampled noisy functions. J. Nonparametr. Statist. 12 503–538. · Zbl 0951.62030 |

[8] | Cardot, H., Ferraty, F. and Sarda, P. (2000). Étude asymptotique d’un estimateur spline hybride pour le modèle linéaire fonctionnel. C. R. Acad. Sci. Paris Sér. I Math. 330 501–504. · Zbl 0944.62040 |

[9] | Cardot, H., Ferraty, F. and Sarda, P. (2003). Spline estimators for the functional linear model. Statist. Sinica 13 571–591. · Zbl 1050.62041 |

[10] | Castro, P., Lawton, W. and Sylvestre, E. (1986). Principal modes of variation for processes with continuous sample curves. Technometrics 28 329–337. · Zbl 0615.62074 |

[11] | Dauxois, J., Pousse, A. and Romain, Y. (1982). Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. J. Multivariate Anal. 12 136–154. · Zbl 0539.62064 |

[12] | Diggle, P., Heagerty, P., Liang, K.-Y. and Zeger, S. (2002). Analysis of Longitudinal Data , 2nd ed. Oxford Univ. Press. · Zbl 1031.62002 |

[13] | Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961. · Zbl 1092.62031 |

[14] | Girard, S. (2000). A nonlinear PCA based on manifold approximation. Comput. Statist. 15 145–167. · Zbl 0976.62056 |

[15] | Hall, P. and Hosseini-Nasab, M. (2006). On properties of functional principal components analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 109–126. · Zbl 1141.62048 |

[16] | Indritz, J. (1963). Methods in Analysis . Macmillan, New York. · Zbl 0129.26901 |

[17] | James, G. M., Hastie, T. J. and Sugar, C. A. (2000). Principal component models for sparse functional data. Biometrika 87 587–602. JSTOR: · Zbl 0962.62056 |

[18] | Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327. · Zbl 1016.62078 |

[19] | Jones, M. C. and Rice, J. (1992). Displaying the important features of large collections of similar curves. Amer. Statist. 46 140–145. |

[20] | Jones, R. H. (1993). Longitudinal Data with Serial Correlation : A State-Space Approach. Chapman and Hall, London. · Zbl 0851.62059 |

[21] | Kneip, A. and Utikal, K. J. (2001). Inference for density families using functional principal component analysis (with discussion). J. Amer. Statist. Assoc. 96 519–542. JSTOR: · Zbl 1019.62060 |

[22] | Lin, X. and Carroll, R. J. (2000). Nonparametric function estimation for clustered data when the predictor is measured without/with error. J. Amer. Statist. Assoc. 95 520–534. JSTOR: · Zbl 0995.62043 |

[23] | Mas, A. and Menneteau, L. (2003). Perturbation approach applied to the asymptotic study of random operators. In High Dimensional Probability. III (J. Hoffmann-Jørgensen, M. B. Marcus and J. A. Wellner, eds.) 127–134. Birkhäuser, Basel. · Zbl 1053.60002 |

[24] | Müller, H.-G. (2005). Functional modeling and classification of longitudinal data (with discussion). Scand. J. Statist. 32 223–246. · Zbl 1089.62072 |

[25] | Pezzulli, S. and Silverman, B. W. (1993). Some properties of smoothed principal components analysis for functional data. Comput. Statist. 8 1–16. · Zbl 0775.62146 |

[26] | Ramsay, J. O. and Ramsey, J. B. (2002). Functional data analysis of the dynamics of the monthly index of nondurable goods production. J. Econometrics 107 327–344. · Zbl 1051.62118 |

[27] | Ramsay, J. O. and Silverman, B. W. (2002). Applied Functional Data Analysis : Methods and Case Studies. Springer, New York. · Zbl 1011.62002 |

[28] | Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis , 2nd ed. Springer, New York. · Zbl 1079.62006 |

[29] | Rao, C. R. (1958). Some statistical models for comparison of growth curves. Biometrics 14 1–17. · Zbl 0079.35704 |

[30] | Rice, J. A. (1986). Convergence rates for partially splined models. Statist. Probab. Lett. 4 203–208. · Zbl 0628.62077 |

[31] | Rice, J. A. (2004). Functional and longitudinal data analysis: Perspectives on smoothing. Statist. Sinica 14 631–647. · Zbl 1073.62033 |

[32] | Rice, J. A. and Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. J. Roy. Statist. Soc. Ser. B 53 233–243. JSTOR: · Zbl 0800.62214 |

[33] | Rice, J. A. and Wu, C. O. (2001). Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57 253–259. JSTOR: · Zbl 1209.62061 |

[34] | Shi, M., Weiss, R. E. and Taylor, J. M. G. (1996). An analysis of paediatric CD4 counts for acquired immune deficiency syndrome using flexible random curves. Appl. Statist. 45 151–163. · Zbl 0875.62574 |

[35] | Staniswalis, J. G. and Lee, J. J. (1998). Nonparametric regression analysis of longitudinal data. J. Amer. Statist. Assoc. 93 1403–1418. JSTOR: · Zbl 1064.62522 |

[36] | Stone, C. J. (1980). Optimal rates of convergence for nonparametric estimators. Ann. Statist. 8 1348–1360. · Zbl 0451.62033 |

[37] | Yao, F., Müller, H.-G., Clifford, A. J., Dueker, S. R., Follett, J., Lin, Y., Buchholz, B. A. and Vogel, J. S. (2003). Shrinkage estimation for functional principal component scores, with application to the population kinetics of plasma folate. Biometrics 59 676–685. · Zbl 1210.62076 |

[38] | Yao, F., Müller, H.-G. and Wang, J.-L. (2005). Functional data analysis for sparse longitudinal data. J. Amer. Statist. Assoc. 100 577–590. · Zbl 1117.62451 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.