Modular regression – a Lego system for building structured additive distributional regression models with tensor product interactions.

*(English)*Zbl 1428.62158Summary: Semiparametric regression models offer considerable flexibility concerning the specification of additive regression predictors including effects as diverse as nonlinear effects of continuous covariates, spatial effects, random effects, or varying coefficients. Recently, such flexible model predictors have been combined with the possibility to go beyond pure mean-based analyses by specifying regression predictors on potentially all parameters of the response distribution in a distributional regression framework. In this paper, we discuss a generic concept for defining interaction effects in such semiparametric distributional regression models based on tensor products of main effects. These interactions can be assigned anisotropic penalties, i.e. different amounts of smoothness will be associated with the interacting covariates. We investigate identifiability and the decomposition of interactions into main effects and pure interaction effects (similar as in a smoothing spline analysis of variance) to facilitate a modular model building process. The decomposition is based on orthogonality in function spaces which allows for considerable flexibility in setting up the effect decomposition. Inference is based on Markov chain Monte Carlo simulations with iteratively weighted least squares proposals under constraints to ensure identifiability and effect decomposition. One important aspect is therefore to maintain sparse matrix structures of the tensor product also in identifiable, decomposed model formulations. The performance of modular regression is verified in a simulation on decomposed interaction surfaces of two continuous covariates and two applications on the construction of spatio-temporal interactions for the analysis of precipitation on the one hand and functional random effects for analysing house prices on the other hand.

##### MSC:

62G08 | Nonparametric regression and quantile regression |

62J12 | Generalized linear models (logistic models) |

62H11 | Directional data; spatial statistics |

65D07 | Numerical computation using splines |

62P05 | Applications of statistics to actuarial sciences and financial mathematics |

##### Keywords:

constrained sampling; distributional regression; functional random effects; Markov chain Monte Carlo simulations; penalised splines; smoothing spline analysis of variance; space-time models; tensor product interactions
Full Text:
DOI

##### References:

[1] | Adler D, Kneib T, Lang S, Umlauf N, Zeileis A (2012) BayesXsrc: R Package Distribution of the BayesX C++ Sources. R package version 3.0-0. https://CRAN.R-project.org/package=BayesXsrc. Accessed 29 Jan 2019 |

[2] | Belitz C, Brezger A, Klein N, Kneib T, Lang S, Umlauf N (2015) BayesX—Software for Bayesian inference in structured additive regression models. Version 3.0.2. http://www.bayesx.org. Accessed 29 Jan 2019 |

[3] | Besag, J.; Higdon, D., Bayesian analysis of agricultural field experiments, J R Stat Soc Ser B (Methodol), 61, 691-746, (1999) · Zbl 0951.62091 |

[4] | Brezger, A.; Lang, S., Generalized structured additive regression based on Bayesian P-splines, Comput Stat Data Anal, 50, 967-991, (2006) · Zbl 1431.62308 |

[5] | Fahrmeir L, Kneib T (2011) Bayesian smoothing and regression for longitudinal, spatial and event history data. Oxford University Press, New York · Zbl 1249.62003 |

[6] | Fahrmeir, L.; Kneib, T.; Lang, S., Penalized structured additive regression for space-time data: a Bayesian perspective, Stat Sin, 14, 731-761, (2004) · Zbl 1073.62025 |

[7] | Fahrmeir L, Kneib T, Lang S, Marx B (2013) Regression—models, methods and applications. Springer, Berlin · Zbl 1276.62046 |

[8] | Gamerman, D., Sampling from the posterior distribution in generalized linear mixed models, Stat Comput, 7, 57-68, (1997) |

[9] | Gelfand, AE; Sahu, SK, Identifiability, improper priors, and Gibbs sampling for generalized linear models, J Am Stat Assoc, 94, 247-253, (1999) · Zbl 1072.62611 |

[10] | Gelman, A., Prior distributions for variance parameters in hierarchichal models, Bayesian Anal, 1, 515-533, (2006) · Zbl 1331.62139 |

[11] | Goicoa, T.; Adin, A.; Ugarte, MD; Hodges, JS, In spatio-temporal disease mapping models, identifiability constraints affet PQL and INLA results, Stoch Environ Res Risk Assess, 32, 749-770, (2018) |

[12] | Gu C (2002) Smoothing spline ANOVA models. Springer, New York · Zbl 1051.62034 |

[13] | Hodges J S (2013) Richly parameterized linear models: additive, time series, and spatial models using random effects. Chapman & Hall/CRC, New York/Boca Raton · Zbl 1282.62197 |

[14] | Hughes, J.; Haran, M., Dimension reduction and alleviation of confounding for spatial generalized linear mixed models, J R Stat Soc Ser B (Stat Methodol), 75, 139-159, (2013) |

[15] | Klein N (2018) sdPrior: scale-dependent hyperpriors in structured additive distributional regression. R package version 1.0 |

[16] | Klein, N.; Kneib, T., Scale-dependent priors for variance parameters in structured additive distributional regression, Bayesian Anal, 11, 1071-1106, (2016) · Zbl 1357.62115 |

[17] | Klein, N.; Kneib, T., Simultaneous inference in structured additive conditional copula regression models: a unifying Bayesian approach, Stat Comput, 26, 841-860, (2016) · Zbl 06604169 |

[18] | Klein, N.; Kneib, T.; Klasen, S.; Lang, S., Bayesian structured additive distributional regression for multivariate responses, J R Stat Soc Ser C (Appl Stat), 64, 569-591, (2015) |

[19] | Klein, N.; Kneib, T.; Lang, S., Bayesian generalized additive models for location, scale and shape for zero-inflated and overdispersed count data, J Am Stat Assoc, 110, 405-419, (2015) · Zbl 1373.62103 |

[20] | Klein, N.; Kneib, T.; Lang, S.; Sohn, A., Bayesian structured additive distributional regression with with an application to regional income inequality in Germany, Ann Appl Stat, 9, 1024-1052, (2015) · Zbl 1454.62485 |

[21] | Knorr-Held, L., Bayesian modelling of inseparable space-time variation in disease risk, Stat Med, 19, 2555-2567, (2000) |

[22] | Lang, S.; Brezger, A., Bayesian P-splines, J Comput Graph Stat, 13, 183-212, (2004) · Zbl 1431.62308 |

[23] | Lang, S.; Umlauf, N.; Wechselberger, P.; Harttgen, K.; Kneib, T., Multilevel structured additive regression, Stat Comput, 24, 223-238, (2014) · Zbl 1325.62179 |

[24] | Lavine, M.; Hodges, JS, On rigorous specification of icar models, Am Stat, 66, 42-49, (2012) |

[25] | Lee, D-J; Durbán, M., P-spline ANOVA type interaction models for spatio temporal smoothing, Stat Model, 11, 46-69, (2011) |

[26] | Marí-Dell’Olmo, M.; Martinez-Beneito, MA; Mercè Gotsens, M.; Palència, L., A smoothed anova model for multivariate ecological regression, Stoch Environ Res Risk Assess, 28, 695-706, (2014) |

[27] | Marra, G.; Radice, R., Bivariate copula additive models for location, scale and shape, Comput Stat Data Anal, 112, 99-113, (2017) · Zbl 06914552 |

[28] | Marra, G.; Wood, SN, Coverage properties of confidence intervals for generalized additive model components, Scand J Stat, 39, 53-74, (2012) · Zbl 1246.62058 |

[29] | Paciorek, CJ, Bayesian smoothing with Gaussian processes using Fourier basis functions in the spectralGP package, J Stat Softw, 19, 1-38, (2007) |

[30] | R Core Team (2017) R: a Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. https://www.R-project.org/. Accessed 29 Jan 2019 |

[31] | Reich, BJ; Hodges, JS; Zadnik, V., Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models, Biometrics, 62, 1197-1206, (2006) · Zbl 1114.62124 |

[32] | Rigby, RA; Stasinopoulos, DM, Generalized additive models for location, scale and shape (with discussion), J R Stat Soc Ser C (Appl Stat), 54, 507-554, (2005) · Zbl 05188697 |

[33] | Rodriguez Alvarez, MX; Lee, D-J; Kneib, T.; Durban, M.; Eilers, P., Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm, Stat Comput, 25, 941-957, (2015) · Zbl 1332.62139 |

[34] | Rue H, Held L (2005) Gaussian Markov random fields. Chapman & Hall/CRC, New York/Boca Raton · Zbl 1093.60003 |

[35] | Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, Cambridge · Zbl 1038.62042 |

[36] | Simpson, D.; Rue, H.; Martins, TG; Riebler, A.; Sørbye, SH, Penalising model component complexity: a principled, practical approach to constructing prior, Stat Sci, 32, 1-28, (2017) · Zbl 1442.62060 |

[37] | Stauffer, R.; Mayr, GJ; Messner, JW; Umlauf, N.; Zeileis, A., Spatio-temporal precipitation climatology over complex terrain using a censored additive regression model, Int J Climatol, 15, 3264, (2016) |

[38] | Stauffer, R.; Umlauf, N.; Messner, JW; Mayr, GJ; Zeileis, A., Ensemble postprocessing of daily precipitation sums over complex terrain using censored high-resolution standardized anomalies, Mon Weather Rev, 145, 955-969, (2017) |

[39] | Ugarte, MD; Adin, A.; Goicoa, T., One-dimensional, two-dimensional, and three-dimensional B-splines to specify space-time interations in bayesian disease mapping: model fitting and model identifiability, Spat Stat, 22, 451-468, (2017) |

[40] | Umlauf N, Klein N, Zeileis A, Köhler M (2018) \bfbamlss : Bayesian additive models for location scale and shape (and Beyond). R package version 1.0-0. http://CRAN.R-project.org/package=bamlss. Accessed 29 Jan 2019 |

[41] | Wahba, G.; Wang, Y.; Gu, C.; Klein, R.; Klein, B., Smoothing spline anova for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy, Ann Stat, 23, 1865-1895, (1995) · Zbl 0854.62042 |

[42] | Wood, SN, Low-rank scale-invariant tensor product smooths for generalized additive mixed models, Biometrics, 62, 1025-1036, (2006) · Zbl 1116.62076 |

[43] | Wood, SN, Fast stable direct fitting and smoothness selection for generalized additive models, J R Stat Soc Ser B (Stat Methodol), 70, 495-518, (2008) · Zbl 05563356 |

[44] | Wood S (2015) mgcv: Mixed GAM computation vehicle with GCV/AIC/REML smoothness estimations. R package version 1.8-5 |

[45] | Wood SN (2017) Generalized additive models : an introduction with R. Chapman & Hall/CRC, New York/Boca Raton · Zbl 1368.62004 |

[46] | Wood, SN; Scheipl, F.; Faraway, JJ, Straightforward intermediate rank tensor product smoothing in mixed models, Stat Comput, 23, 341-360, (2013) · Zbl 1322.62197 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.