×

An RKHS model for variable selection in functional linear regression. (English) Zbl 1416.62331

Summary: A mathematical model for variable selection in functional linear regression models with scalar response is proposed. By “variable selection” we mean a procedure to replace the whole trajectories of the functional explanatory variables with their values at a finite number of carefully selected instants (or “impact points”). The basic idea of our approach is to use the Reproducing Kernel Hilbert Space (RKHS) associated with the underlying process, instead of the more usual \(L^2 [0, 1]\) space, in the definition of the linear model. This turns out to be especially suitable for variable selection purposes, since the finite-dimensional linear model based on the selected “impact points” can be seen as a particular case of the RKHS-based linear functional model. In this framework, we address the consistent estimation of the optimal design of impact points and we check, via simulations and real data examples, the performance of the proposed method.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G05 Nonparametric estimation
62J05 Linear regression; mixed models
46E22 Hilbert spaces with reproducing kernels (= (proper) functional Hilbert spaces, including de Branges-Rovnyak and other structured spaces)
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Aneiros, G.; Vieu, P., Variable selection in infinite-dimensional problems, Statist. Probab. Lett., 94, 12-20 (2014) · Zbl 1320.62163
[2] Aneiros, G.; Vieu, P., Sparse nonparametric model for regression with functional covariate, J. Nonparametr. Stat., 28, 4, 839-859 (2016) · Zbl 1348.62131
[3] Berlinet, A.; Thomas-Agnan, C., Reproducing Kernel Hilbert Spaces in Probability and Statistics (2004), Kluwer Academic: Kluwer Academic Boston · Zbl 1145.62002
[4] Berrendero, J. R.; Cuevas, A.; Torrecilla, J. L., Variable selection in functional data classification: a maxima-hunting proposal, Statist. Sinica, 26, 619-638 (2016) · Zbl 1356.62079
[5] Berrendero, J. R.; Cuevas, A.; Torrecilla, J. L., On the use of reproducing kernel Hilbert spaces in functional classification, J. Amer. Statist. Assoc. (2017), (in press)
[6] Cardot, H.; Sarda, P., Functional Linear Regression, (Ferraty, F.; Romain, Y., Handbook of Functional Data Analysis (2010), Oxford University Press: Oxford University Press Oxford), 21-46
[7] Cuevas, A., A partial overview of the theory of statistics with functional data, J. Statist. Plann. Inference, 147, 1-23 (2014) · Zbl 1278.62012
[8] Cuevas, A.; Febrero, M.; Fraiman, R., Linear functional regression: The case of fixed design and functional response, Can. J. Statist. / Rev. Can. Statist., 30, 2, 285-300 (2002) · Zbl 1012.62039
[9] Delaigle, A.; Hall, P.; Bathia, N., Componentwise classification and clustering of functional data, Biometrika, 99, 2, 299 (2012) · Zbl 1244.62090
[10] Fan, J.; Lv, J., Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. Ser. B Stat. Methodol., 70, 5, 849-911 (2008) · Zbl 1411.62187
[11] Fan, J.; Lv, J., A selective overview of variable selection in high dimensional feature space, Statist. Sinica, 20, 1, 101 (2010) · Zbl 1180.62080
[12] Ferraty, F.; Hall, P.; Vieu, P., Most-predictive design points for functional data predictors, Biometrika, 97, 4, 807-824 (2010) · Zbl 1204.62064
[13] Ferraty, F.; Vieu, P., Nonparametric Functional Data Analysis: Theory and Practice (2006), Springer Science & Business Media · Zbl 1119.62046
[14] Fraiman, R.; Gimenez, Y.; Svarc, M., Feature selection for functional data, J. Multivariate Anal., 146, 191-208 (2016) · Zbl 1335.62097
[15] Goia, A.; Vieu, P., An introduction to recent advances in high/infinite dimensional statistics, J. Multivariate Anal., 146, 1-6 (2016), Special Issue on Statistical Models and Methods for High or Infinite Dimensional Spaces · Zbl 1384.00073
[16] Horváth, L.; Kokoszka, P., Inference for Functional Data with Applications, Vol. 200 (2012), Springer Science & Business Media
[17] Hsing, T.; Eubank, R., Theoretical Foundations of Functional Data Analysis, With an Introduction to Linear Operators (2015), John Wiley & Sons · Zbl 1338.62009
[18] Hsing, T.; Ren, H., An RKHS formulation of the inverse regression dimension-reduction problem, Ann. Statist., 37, 2, 726-755 (2009) · Zbl 1162.62053
[19] Janson, S., Gaussian Hilbert Spaces, Vol. 129 (1997), Cambridge university press
[20] Ji, H.; Müller, H.-G., Optimal designs for longitudinal and functional data, J. R. Stat. Soc. Ser. B Stat. Methodol., 79, 3, 859-876 (2017) · Zbl 1411.62231
[21] Kadri, H.; Duflos, E.; Preux, P.; Canu, S.; Rakotomamonjy, A.; Audiffren, J., Operator-valued kernels for learning from functional response data, J. Mach. Learn. Res., 16, 1-54 (2015)
[22] Kneip, A.; Poss, D.; Sarda, P., Funcional linear regression with points of impact, Ann. Statist., 44, 1, 1-30 (2016) · Zbl 1331.62233
[23] Laha, R.; Rohatgi, V., (Probability Theory. Probability Theory, Wiley Series in Probability and Mathematical Statistics (1979), John Wiley & Sons: John Wiley & Sons New York-Chichester-Brisbane) · Zbl 0409.60001
[24] Lukić, M. N.; Beder, J. H., Stochastic processes with sample paths in reproducing kernel Hilbert spaces, Trans. Amer. Math. Soc., 353, 10, 3945-3969 (2001) · Zbl 0973.60036
[25] McKeague, I. W.; Sen, B., Fractals with point impact in functional linear regression, Ann. Statist., 38, 4, 2559-2586 (2010) · Zbl 1196.62116
[26] Miller, A. J., (Subset Selection in Regression. Subset Selection in Regression, Monographs on statistics and applied probability (2002), Chapman & Hall/CRC: Chapman & Hall/CRC Boca Raton) · Zbl 1051.62060
[27] Parzen, E., An approach to time series analysis, Ann. Math. Statist., 951-989 (1961) · Zbl 0107.13801
[28] Pillai, N. S.; Wu, Q.; Liang, F.; Mukherjee, S.; Wolpert, R. L., Characterizing the function space for Bayesian kernel models, J. Mach. Learn. Res., 8, Aug, 1769-1797 (2007) · Zbl 1222.62039
[29] Székely, G. J.; Rizzo, M. L.; Bakirov, N. K., Measuring and testing dependence by correlation of distances, Ann. Statist., 35, 6, 2769-2794 (2007) · Zbl 1129.62059
[30] Yenigün, C. D.; Rizzo, M. L., Variable selection in regression using maximal correlation and distance correlation, J. Stat. Comput. Simul., 85, 8, 1692-1705 (2015) · Zbl 1457.62190
[31] Yuan, M.; Cai, T. T., A reproducing kernel Hilbert space approach to functional linear regression, Ann. Statist., 38, 6, 3412-3444 (2010) · Zbl 1204.62074
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.