# zbMATH — the first resource for mathematics

Factor models and variable selection in high-dimensional regression analysis. (English) Zbl 1231.62131
Summary: The paper considers linear regression problems where the number of predictor variables is possibly larger than the sample size. The basic motivation of the study is to combine the points of view of model selection and functional regression by using a factor approach: it is assumed that the predictor vector can be decomposed into a sum of two uncorrelated random components reflecting common factors and specific variabilities of the explanatory variables. It is shown that the traditional assumption of a sparse vector of parameters is restrictive in this context. Common factors may possess a significant influence on the response variable which cannot be captured by the specific effects of a small number of individual variables. We therefore propose to include principal components as additional explanatory variables in an augmented regression model. We give finite sample inequalities for estimates of these components. It is then shown that model selection procedures can be used to estimate the parameters of the augmented model, and we derive theoretical properties of the estimators. Finite sample performance is illustrated by a simulation study.

##### MSC:
 62J05 Linear regression; mixed models 62H25 Factor analysis and principal components; correspondence analysis 62H12 Estimation in multivariate analysis 62F12 Asymptotic properties of parametric estimators 65C60 Computational problems in statistics (MSC2010)
##### Keywords:
linear regression; model selection; functional regression
Full Text:
##### References:
  Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71 135-171. · Zbl 1136.62354 · doi:10.1111/1468-0262.00392  Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica 77 1229-1279. · Zbl 1183.62196 · doi:10.3982/ECTA6135  Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica 70 191-221. · Zbl 1103.91399 · doi:10.1111/1468-0262.00273  Bernanke, B. S. and Boivin, J. (2003). Monetary policy in a data-rich environment. Journal of Monetary Economics 50 525-546.  Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169 . Springer, New York. · Zbl 0863.15001  Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199-227. · Zbl 1132.62040 · doi:10.1214/009053607000000758 · euclid:aos/1201877299  Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620  Cai, T. T. and Hall, P. (2006). Prediction in functional linear regression. Ann. Statist. 34 2159-2179. · Zbl 1106.62036 · doi:10.1214/009053606000000830  Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313-2351. · Zbl 1139.62019 · doi:10.1214/009053606000001523 · euclid:aos/1201012958  Cardot, H., Ferraty, F. and Sarda, P. (1999). Functional linear model. Statist. Probab. Lett. 45 11-22. · Zbl 0962.62081 · doi:10.1016/S0167-7152(99)00036-X  Cardot, H., Mas, A. and Sarda, P. (2007). CLT in functional linear regression models. Probab. Theory Related Fields 138 325-361. · Zbl 1113.60025 · doi:10.1007/s00440-006-0025-2  Crambes, C., Kneip, A. and Sarda, P. (2009). Smoothing splines estimators for functional linear regression. Ann. Statist. 37 35-72. · Zbl 1169.62027 · doi:10.1214/07-AOS563  Cuevas, A., Febrero, M. and Fraiman, R. (2002). Linear functional regression: The case of fixed design and functional response. Canad. J. Statist. 30 285-300. · Zbl 1012.62039 · doi:10.2307/3315952  Forni, M. and Lippi, M. (1997). Aggregation and the Microfoundations of Dynamic Macroeconomics . Oxford Univ. Press, Oxford. · Zbl 0954.91046  Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2000). The generalized dynamic factor model: Identification and estimation. Review of Economics and Statistics 82 540-554. · Zbl 1117.62334  Hall, P. and Horowitz, J. L. (2007). Methodology and convergence rates for functional linear regression. Ann. Statist. 35 70-91. · Zbl 1114.62048 · doi:10.1214/009053606000000957  Hall, P. and Hosseini-Nasab, M. (2006). On properties of functional principal components analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 109-126. · Zbl 1141.62048 · doi:10.1111/j.1467-9868.2005.00535.x  Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13-30. · Zbl 0127.10602 · doi:10.2307/2282952  Kneip, A. and Utikal, K. J. (2001). Inference for density families using functional principal component analysis. J. Amer. Statist. Assoc. 96 519-542. With comments and a rejoinder by the authors. · Zbl 1019.62060 · doi:10.1198/016214501753168235  Koltchinskii, V. (2009). The Dantzig selector and sparsity oracle inequalities. Bernoulli 15 799-828. · Zbl 1452.62486 · doi:10.3150/09-BEJ187  Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281  Ramsay, J. O. and Dalzell, C. J. (1991). Some tools for functional data analysis (with discussion). J. Roy. Statist. Soc. Ser. B 53 539-572. · Zbl 0800.62314  Stock, J. H. and Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. J. Amer. Statist. Assoc. 97 1167-1179. · Zbl 1041.62081 · doi:10.1198/016214502388618960  Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538  van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614-645. · Zbl 1138.62323 · doi:10.1214/009053607000000929  Yao, F., Müller, H.-G. and Wang, J.-L. (2005). Functional linear regression analysis for longitudinal data. Ann. Statist. 33 2873-2903. · Zbl 1084.62096 · doi:10.1214/009053605000000660  Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008 · www.jmlr.org  Zhou, S., Lafferty, J. and Wassermn, L. (2008). Time varying undirected graphs. In Proceedings of the 21 st Annual Conference on Computational Learning Theory ( COLT’ 08). Available at . · arxiv.org  Zhou, S., van de Geer, S. and Bülhmann, P. (2009). Adaptive Lasso for high dimensional regression and Gaussian graphical modeling.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.