zbMATH — the first resource for mathematics

More efficient approximation of smoothing splines via space-filling basis selection. (English) Zbl 07263823
Summary: We consider the problem of approximating smoothing spline estimators in a nonparametric regression model. When applied to a sample of size \(n\), the smoothing spline estimator can be expressed as a linear combination of \(n\) basis functions, requiring \(O(n^3)\) computational time when the number \(d\) of predictors is two or more. Such a sizeable computational cost hinders the broad applicability of smoothing splines. In practice, the full-sample smoothing spline estimator can be approximated by an estimator based on \(q\) randomly selected basis functions, resulting in a computational cost of \(O(nq^2)\). It is known that these two estimators converge at the same rate when \(q\) is of order \(O\{n^{2/(pr+1)}\}\), where \(p \in [1,2]\) depends on the true function and \(r > 1\) depends on the type of spline. Such a \(q\) is called the essential number of basis functions. In this article, we develop a more efficient basis selection method. By selecting basis functions corresponding to approximately equally spaced observations, the proposed method chooses a set of basis functions with great diversity. The asymptotic analysis shows that the proposed smoothing spline estimator can decrease \(q\) to around \(O\{n^{1/(pr+1)}\}\) when \(d \leq pr + 1\). Applications to synthetic and real-world datasets show that the proposed method leads to a smaller prediction error than other basis selection methods.
62G08 Nonparametric regression and quantile regression
62K15 Factorial statistical designs
65D07 Numerical computation using splines
62P12 Applications of statistics to environmental and related topics
Full Text: DOI