##
**Simultaneous variable selection and smoothing for high-dimensional function-on-scalar regression.**
*(English)*
Zbl 1433.62111

The paper under review deals with the problem of simultaneously selecting important predictors and producing smooth estimates of their effects in a function-on-scalar linear regression model (where the parameters and errors lie in a real separable Hilbert space but the predictors are real-valued scalars) with a large number of scalar predictors. According to the authors: “While other methods are available for selection and estimation, none of them incorporate smoothing as well.” The authors solve the cited problem by presenting a new functional linear adaptive mixed estimation (FLAME) methodology. They also provide a fast algorithm for computing the estimators, which is based on a functional coordinate descent, and an R package that is customized for functional data, resulting in substantial gains in computational efficiency. Asymptotic properties of the estimators are developed and simulations are provided to illustrate the advantages of FLAME over existing methods, both in terms of statistical performance and computational efficiency. The paper is concluded with an application to childhood asthma, where a potentially important genetic mutation is found that was not selected by previous functional data based methods, and with the discussion about opportunities for improvement.

Reviewer: Joseph Melamed (Los Angeles)

### MSC:

62G08 | Nonparametric regression and quantile regression |

62G20 | Asymptotic properties of nonparametric inference |

62J02 | General nonlinear regression |

62P10 | Applications of statistics to biology and medical sciences; meta analysis |

46E22 | Hilbert spaces with reproducing kernels (= (proper) functional Hilbert spaces, including de Branges-Rovnyak and other structured spaces) |

### Keywords:

nonlinear regression; variable selection; functional data analysis; reproducing kernel Hilbert space; minimax convergence
PDFBibTeX
XMLCite

\textit{A. Parodi} and \textit{M. Reimherr}, Electron. J. Stat. 12, No. 2, 4602--4639 (2018; Zbl 1433.62111)

### References:

[1] | R. Barber, M. Reimherr, and T. Schill. The function-on-scalar lasso with applications to longitudinal GWAS., Electronic Journal of Statistics, 11(1) :1351-1389, 2017. · Zbl 1362.62084 |

[2] | V. Barbu and T. Precupanu., Convexity and optimization in Banach spaces. Springer Science & Business Media, 2012. · Zbl 1244.49001 |

[3] | H. H. Bauschke and P. L. Combettes., Convex analysis and monotone operator theory in Hilbert spaces. Springer Science & Business Media, 2011. · Zbl 1218.47001 |

[4] | A. Berlinet and C. Thomas-Agnan., Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media, 2011. · Zbl 1145.62002 |

[5] | S. Boyd and L. Vandenberghe., Convex optimization. Cambridge university press, 2004. · Zbl 1058.90049 |

[6] | P. Bühlmann, M. Kalisch, and M. Maathuis. Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm., Biometrika, 97(2):261-278, 2010. · Zbl 1233.62135 |

[7] | H. Cardot, A. Mas, and P. Sarda. CLT in functional linear regression models., Probability Theory and Related Fields, 138(3-4):325-361, 2007. · Zbl 1113.60025 |

[8] | Y. Chen, J. Goldsmith, and T. Ogden. Variable selection in function-on-scalar regression., Stat, 5:88-101, 2016. |

[9] | W. Chu, R. Li, and M. Reimherr. Feature screening for time-varying coefficient models with ultrahigh-dimensional longitudinal data., Ann. Appl. Stat., 10(2):596-617, 06 2016. URL http://dx.doi.org/10.1214/16-AOAS912. · Zbl 1400.62255 |

[10] | dbGaP. SHARP - national heart, lung, and blood institute snp health association asthma resource project., http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000166.v2.p1, 2009. |

[11] | N. Dunford and J. Schwartz., Linear operators. Part 2: Spectral theory. Self adjoint operators in Hilbert space. Interscience Publishers, 1963. · Zbl 0128.34803 |

[12] | Y. Fan, G. James, and P. Radchenko. Functional Additive Regression., Annals of Statistics, 43 :2296-2325, 2015. · Zbl 1327.62252 |

[13] | Z. Fan and M. Reimherr. Adaptive function-on-scalar regression., Econometrics and Statistics, 1:167-183, 2017. |

[14] | J. Gertheiss, A. Maity, and A. Staicu. Variable selection in generalized functional linear models., Stat, 2(1):86-101, 2013. |

[15] | S. Graves, G. Hooker, and J. Ramsay., Functional Data Analysis with R and MATLAB. Springer, 2009. · Zbl 1179.62006 |

[16] | T. Hastie, R. Tibshirani, and J. Friedman., Elements of Statistical Learning. Springer, 2001. · Zbl 0973.62007 |

[17] | T. Hastie, R. Tibshirani, and M. Wainwright., Statistical learning with sparsity: the lasso and generalizations. CRC Press, 2015. · Zbl 1319.68003 |

[18] | L. Horváth and P. S. Kokoszka., Inference for Functional Data with Applications. Springer, 2012. · Zbl 1279.62017 |

[19] | T. Hsing and R. Eubank., Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. John Wiley & Sons, 2015. · Zbl 1338.62009 |

[20] | J. Huang, S. Ma, and C.-H. Zhang. Adaptive lasso for sparse high-dimensional regression models., Statistica Sinica, pages 1603-1618, 2008. · Zbl 1255.62198 |

[21] | G. James, T. Hastie, D. Witten, and R. Tibshirani., An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer London, Limited, 2013. ISBN 9781461471370. URL http://books.google.com/books?id=at1bmAEACAAJ. · Zbl 1281.62147 |

[22] | P. Kokoszka and M. Reimherr., Introduction to Functional Data Analysis. Chapman & Hall, 2017. · Zbl 1274.62600 |

[23] | R. Laha and V. Rohatgi., Probability Theory. Wiley, New York, 1979. |

[24] | H. Lian. Shrinkage estimation and selection for multiple functional regression., Statistica Sinica, 23:51-74, 2013. · Zbl 1257.62041 |

[25] | H. Matsui and S. Konishi. Variable selection for functional regression models via the L1 regularization., Computational Statistics & Data Analysis, 55(12) :3304-3310, 2011. · Zbl 1271.62140 |

[26] | H. S. Noh and B. U. Park. Sparse varying coefficient models for longitudinal data., Statistica Sinica, pages 1183-1202, 2010. · Zbl 1507.62241 |

[27] | H. Rajeevan, U. Soundararajan, S. Stein, K. K. Kidd, and P. Miller. ALFRED - the allele frequency database., https://alfred.med.yale.edu/alfred/AboutALFRED.asp, 1999. Yale University. |

[28] | J. O. Ramsay and B. Silverman., Functional data analysis. Wiley Online Library, 2006. · Zbl 1079.62006 |

[29] | E. Repapi, I. Sayers, L. V. Wain, P. R. Burton, T. Johnson, M. Obeidat, J. H. Zhao, A. Ramasamy, G. Zhai, V. Vitart, et al. Genome-wide association study identifies five loci associated with lung function., Nature genetics, 42(1):36-44, 2010. |

[30] | N. Z. Shor., Minimization methods for non-differentiable functions, volume 3. Springer Science & Business Media, 2012. |

[31] | B. Sriperumbudur and Z. Szabó. Optimal rates for random fourier features. In, Advances in Neural Information Processing Systems, pages 1144-1152, 2015. |

[32] | D. P. Strachan, A. R. Rudnicka, C. Power, P. Shepherd, E. Fuller, A. Davis, I. Gibb, M. Kumari, A. Rumley, G. J. Macfarlane, et al. Lifecourse influences on health among british adults: effects of region of residence in childhood and adulthood., International journal of epidemiology, 36(3):522-531, 2007. |

[33] | The Childhood Asthma Management Program Research Group. The childhood asthma management program (CAMP): design, rationale, and methods., Controlled Clinical Trials, 20:91-120, 1999. |

[34] | H. Wang, G. Zou, and A. T. Wan. Adaptive lasso for varying-coefficient partially linear measurement error models., Journal of Statistical Planning and Inference, 143(1):40-54, 2013. · Zbl 1251.62014 |

[35] | L. Wang, H. Li, and J. Huang. Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements., Journal of the American Statistical Association, 103(484) :1556-1569, 2008. · Zbl 1286.62034 |

[36] | F. Wei, J. Huang, and H. Li. Variable selection and estimation in high-dimensional varying-coefficient models., Statistica Sinica, 12(4) :1515-1540, 2011. · Zbl 1225.62056 |

[37] | L. Xue and A. Qu. Variable selection in high-dimensional varying-coefficient models with global optimality., Journal of Machine Learning Research, 13(Jun) :1973-1998, 2012. · Zbl 1435.62093 |

[38] | C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty., The Annals of statistics, pages 894-942, 2010. · Zbl 1183.62120 |

[39] | P. Zhao and B. Yu. On model selection consistency of lasso., Journal of Machine Learning Research, 7(Nov) :2541-2563, 2006. · Zbl 1222.62008 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.