×

Variable selection in functional data classification: a maxima-hunting proposal. (English) Zbl 1356.62079

Summary: Variable selection is considered in the setting of supervised binary classification with functional data \(\{X(t), t \in [0,1]\}\). By “variable selection” we mean any dimension-reduction method that leads to the replacement of the whole trajectory \(\{X(t), t \in [0,1]\}\), with a low-dimensional vector \((X(t_{1}),\ldots,X(t_{d}))\) still keeping a similar classification error. Our proposal for variable selection is based on the idea of selecting the local maxima \((t_{1},\ldots,t_{d})\) of the function \(\mathcal{V}_{X}^{2}(t) = \mathcal{V}^{2}(X(t),Y)\), where \(\mathcal{V}\) denotes the “distance covariance” association measure for random variables due to G. J. Székely et al. [Ann. Stat. 35, No. 6, 2769–2794 (2007; Zbl 1129.62059)]. This method provides a simple natural way to deal with the relevance vs. redundancy trade-off which typically appears in variable selection. A result of consistent estimation for the maxima of \(\mathcal{V}_{X}^{2}\) is shown. We also show different models for the underlying process \(X(t)\) under which the relevant information is concentrated on the maxima of \(\mathcal{V}_{X}^{2}\). An extensive empirical study is presented, including about 400 simulated models and data examples aimed at comparing our variable selection method with other standard proposals for dimension reduction.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G08 Nonparametric regression and quantile regression
62G20 Asymptotic properties of nonparametric inference

Citations:

Zbl 1129.62059
PDFBibTeX XMLCite
Full Text: DOI arXiv