×

Faster model matrix crossproducts for large generalized linear models with discretized covariates. (English) Zbl 1436.62352

Summary: S. Wood et al. [“Generalized additive models for gigadata: modelling the UK black smoke network daily data”, J. Am. Stat. Assoc. 112, No. 519, 1199–1210 (2017)] developed methods for fitting penalized regression spline based generalized additive models, with of the order of \(10^4\) coefficients, to up to \(10^8\) data. The methods offered two to three orders of magnitude reduction in computational cost relative to the most efficient previous methods. Part of the gain resulted from the development of a set of methods for efficiently computing model matrix products when model covariates each take only a discrete set of values substantially smaller than the sample size (generalizing an idea first appearing in [S. Lang et al., Stat. Comput. 24, No. 2, 223–238 (2014; Zbl 1325.62179)]). Covariates can always be rounded to achieve such discretization, and it should be noted that the covariate discretization is marginal. That is we do not rely on discretizing covariates jointly, which would typically require the use of very coarse discretization. The most expensive computation in model estimation is the formation of the matrix cross product \(\mathbf{X}^{\mathsf{T}}{\mathbf{WX}}\) where \(\mathbf{X}\) is a model matrix and \({\mathbf{W}}\) a diagonal or tri-diagonal matrix. The purpose of this paper is to present a simple, novel and substantially more efficient approach to the computation of this cross product. The new method offers, for example, a 30 fold reduction in cross product computation time for the Black Smoke model dataset motivating [Wood et al., loc. cit.]. Given this reduction in computational cost, the subsequent Cholesky decomposition of \(\mathbf{X}^{\mathsf{T}}{\mathbf{WX}}\) and follow on computation of \((\mathbf{X}^{\mathsf{T}}{\mathbf{WX}})^{-1}\) become a more significant part of the computational burden, and we also discuss the choice of methods for improving their speed.

MSC:

62J12 Generalized linear models (logistic models)
62M40 Random fields; image analysis
62F15 Bayesian inference
65C05 Monte Carlo methods
62H12 Estimation in multivariate analysis
62-08 Computational methods for problems pertaining to statistics

Citations:

Zbl 1325.62179
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Davis, Ta, Direct Methods for Sparse Linear Systems (2006), Philadelphia: SIAM, Philadelphia · Zbl 1119.65021
[2] Green, Pj; Silverman, Bw, Nonparametric Regression and Generalized Linear Models (1994), London: Chapman & Hall, London · Zbl 0832.62032
[3] Hastie, T.; Tibshirani, R., Generalized Additive Models (1990), London: Chapman & Hall, London · Zbl 0747.62061
[4] Lang, S.; Umlauf, N.; Wechselberger, P.; Harttgen, K.; Kneib, T., Multilevel structured additive regression, Stat. Comput., 24, 2, 223-238 (2014) · Zbl 1325.62179 · doi:10.1007/s11222-012-9366-0
[5] Lucas, C.: LAPACK-style codes for level 2 and 3 pivoted Cholesky factorizations. In: LAPACK Working Paper (2004)
[6] OpenMP Architecture Review Board (2008) OpenMP application program interface version 3.0
[7] Press, W.; Teukolsky, S.; Vetterling, W.; Flannery, B., Numerical Recipes (2007), Cambridge: Cambridge University Press, Cambridge
[8] Ramsay, J.; Silverman, B., Functional Data Analysis (2005), Berlin: Springer, Berlin · Zbl 1079.62006
[9] Silverman, Bw, Some aspects of the spline smoothing approach to non-parametric regression curve fitting, J. R. Stat. Soc. Ser. B, 47, 1, 1-53 (1985) · Zbl 0606.62038
[10] Wood, Sn, Generalized Additive Models: An Introduction with R (2017), Boca Raton: CRC Press, Boca Raton · Zbl 1368.62004
[11] Wood, Sn; Li, Z.; Shaddick, G.; Augustin, Nh, Generalized additive models for gigadata: modelling the UK black smoke network daily data, J. Am. Stat. Assoc., 112, 519, 1199-1210 (2017) · doi:10.1080/01621459.2016.1195744
[12] Xianyi, Z., Qian, W., Chothia, Z.: OpenBLAS. http://xianyi.github.io/OpenBLAS (2014)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.