Automatic component selection in additive modeling of French national electricity load forecasting. (English) Zbl 1366.62055

Cao, Ricardo (ed.) et al., Nonparametric statistics. 2nd ISNPS, Cádiz, June 2014. Selected papers based on the presentations at the second conference of the International Society for Nonparametric Statistic, ISNPS, Cádiz, Spain, June 12–16, 2014. Cham: Springer (ISBN 978-3-319-41581-9/hbk; 978-3-319-41582-6/ebook). Springer Proceedings in Mathematics & Statistics 175, 191-209 (2016).
Summary: We consider estimation and model selection in sparse high-dimensional linear additive models when multiple covariates need to be modeled nonparametrically, and propose some multi-step estimators based on \(B\)-splines approximations of the additive components. In such models, the overall number of regressors \(d\) can be large, possibly much larger than the sample size \(n\). However, we assume that there is a smaller than \(n\) number of regressors that capture most of the impact of all covariates on the response variable. Our estimation and model selection results are valid without assuming the conventional “separation condition” – namely, without assuming that the norm of each of the true nonzero components is bounded away from zero. Instead, we relax this assumption by allowing the norms of nonzero components to converge to zero at a certain rate. The approaches investigated in this paper consist of two steps. The first step implements the variable selection, typically by the Group Lasso, and the second step applies a penalized \(P\)-splines estimation to the selected additive components. Regarding the model selection task we discuss, the application of several criteria such as Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Generalized Cross Validation (GCV) and study the consistency of BIC, i.e. its ability to select the true model with probability converging to 1. We then study post-model estimation consistency of the selected components. We end the paper by applying the proposed procedure on some real data related to electricity load consumption forecasting: the EDF (Électricité de France) portfolio.
62P20 Applications of statistics to economics
62G05 Nonparametric estimation
62J07 Ridge regression; shrinkage estimators (Lasso)
