## The dimensionality reduction principle for generalized additive models.(English)Zbl 0603.62050

Consider an exponential family of distributions of the form $\int^{x}e^{b_ 1(\eta)y+b_ 2(\eta)} \nu (dy)$ with a real parameter $$\eta$$ and a $$\sigma$$-finite measure $$\nu$$ on $${\mathbb{R}}$$. Under suitable assumptions their expectations are given by $$b_ 3(\eta):=-b_ 2'(\eta)/b_ 1'(\eta)$$. Now assume that for a random vector (Y,X) with values in $${\mathbb{R}}\times [0,1]^ J$$, (J$$\in {\mathbb{N}})$$, the conditional distribution belongs to this exponential family with $$\eta =f(x)$$, $$x\in [0,1]^ J$$ and hence $$E(Y| X=x)=b_ 3(f(x))$$. [Exponential response model, see e.g. S. J. Haberman, ibid. 5, 815-841 (1977; Zbl 0368.62019) in case of linear f we have a generalized linear model as in J. A. Nelder and R. W. M. Wedderburn, J. R. Stat. Soc., Ser. A 135, 370-384 (1972).]
It is shown that under suitable assumptions on the conditional distribution, on f and the density g($$\cdot)$$ of X, the expected log- likelihood $\Delta (a)=\int \{b_ 1(a(x))b_ 3(f(x))+b_ 2(a(x))\}g(x)dx$ can be maximized with respect to $$a\in {\mathcal A}=\{a(x_ 1,...,x_ J)=a_ 0+\sum^{J}_{1}a_ j(x_ j)$$ s.t. $$E(a(X))=a_ 0$$, $$E(a_ j(X_ j))=0$$, $$1\leq j\leq J\}$$ (Theorem 1).
The maximizer is called the best additive approximation to the response function f and besides its advantages w.r. to interpretation compared to general approximations it can be estimated from a sample $$(Y_ i,X_ i)^ n_{i=1}$$ in a quality which does not decrease with increasing dimension J and furthermore the speed of convergence is optimal in the $$L_ 2$$ sense (Theorem 2). [For a related result for regression functions, see the author, Adaptive regression and other nonparametric models. Ann. Stat. 13, 689-705 (1985)]. To show this, some spline estimator resulting from maximizing an empirical log-likelihood quantity is used.