Nonlinear predictive latent process models for integrating spatio-temporal exposure data from multiple sources. (English) Zbl 1304.62141

Summary: Spatio-temporal prediction of levels of an environmental exposure is an important problem in environmental epidemiology. Our work is motivated by multiple studies on the spatio-temporal distribution of mobile source, or traffic related, particles in the greater Boston area. When multiple sources of exposure information are available, a joint model that pools information across sources maximizes data coverage over both space and time, thereby reducing the prediction error.
We consider a Bayesian hierarchical framework in which a joint model consists of a set of submodels, one for each data source, and a model for the latent process that serves to relate the submodels to one another. If a submodel depends on the latent process nonlinearly, inference using standard MCMC techniques can be computationally prohibitive. The implications are particularly severe when the data for each submodel are aggregated at different temporal scales.
To make such problems tractable, we linearize the nonlinear components with respect to the latent process and induce sparsity in the covariance matrix of the latent process using compactly supported covariance functions. We propose an efficient MCMC scheme that takes advantage of these approximations. We use our model to address a temporal change of support problem whereby interest focuses on pooling daily and multiday black carbon readings in order to maximize the spatial coverage of the study region.


62P12 Applications of statistics to environmental and related topics
62P10 Applications of statistics to biology and medical sciences; meta analysis
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62M30 Inference from spatial processes
Full Text: DOI arXiv Euclid


[1] Adar, S. D., Klein, R., Klein, B. E. K., Szpiro, A. A., Cotch, M. F., Wong, T. Y., O’Neill, M. S., Shrager, S., Barr, R. G., Siscovick, D. S., Daviglus, M. L., Sampson, P. D. and Kaufman, J. D. (2010). Air pollution and the microvasculature: A cross-sectional assessment of in vivo retinal images in the population-based multi-ethnic study of atherosclerosis (MESA). PLOS Medicine 7 e1000372.
[2] Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2004). Hierarchical Modeling and Analysis for Spatial Data . Chapman & Hall, Boca Raton, FL. · Zbl 1053.62105
[3] Berhane, K., Gauderman, W. J., Stram, D. O. and Thomas, D. C. (2004). Statistical issues in studies of the long-term effects of air pollution: The Southern California children’s health study. Statist. Sci. 19 414-449. · Zbl 1100.62100
[4] Bliznyuk, N., Ruppert, D. and Shoemaker, C. A. (2011). Efficient interpolation of computationally expensive posterior densities with variable parameter costs. J. Comput. Graph. Statist. 20 636-655.
[5] Bliznyuk, N., Paciorek, C. J., Schwartz, J. and Coull, B. (2014). Supplement to “Nonlinear predictive latent process models for integrating spatio-temporal exposure data from multiple sources.” . · Zbl 1304.62141
[6] Calder, C. A. (2007). Dynamic factor process convolution models for multivariate space-time data with application to air quality assessment. Environ. Ecol. Stat. 14 229-247.
[7] Calder, C. A. (2008). A dynamic process convolution approach to modeling ambient particulate matter concentrations. Environmetrics 19 39-48.
[8] Christensen, O. F., Roberts, G. O. and Sköld, M. (2006). Robust Markov chain Monte Carlo methods for spatial generalized linear mixed models. J. Comput. Graph. Statist. 15 1-17.
[9] Christensen, O. F. and Waagepetersen, R. (2002). Bayesian prediction of spatial count data using generalized linear mixed models. Biometrics 58 280-286. · Zbl 1209.62156
[10] Fuentes, M. and Raftery, A. E. (2005). Model evaluation and spatial interpolation by Bayesian combination of observations with outputs from numerical models. Biometrics 61 36-45. · Zbl 1077.62124
[11] Furrer, R., Genton, M. G. and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Statist. 15 502-523.
[12] Gelfand, A., Zhu, L. and Carlin, B. (2001). On the change of support problem for spatio-temporal data. Biostatistics 2 31-45. · Zbl 1022.62095
[13] Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359-378. · Zbl 1284.62093
[14] Gneiting, T., Ševčíková, H. and Percival, D. B. (2012). Estimators of fractal dimension: Assessing the roughness of time series and spatial data. Statist. Sci. 27 247-277. · Zbl 1330.62354
[15] Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations , 3rd ed. Johns Hopkins Univ. Press, Baltimore, MD. · Zbl 0865.65009
[16] Gotway, C. A. and Young, L. J. (2002). Combining incompatible spatial data. J. Amer. Statist. Assoc. 97 632-648. · Zbl 1073.62604
[17] Gotway, C. A. and Young, L. J. (2007). A geostatistical approach to linking geographically aggregated data from different sources. J. Comput. Graph. Statist. 16 115-135.
[18] Gryparis, A., Coull, B. A., Schwartz, J. and Suh, H. H. (2007). Semiparametric latent variable regression models for spatiotemporal modelling of mobile source particles in the greater Boston area. J. Roy. Statist. Soc. Ser. C 56 183-209. · Zbl 1490.62395
[19] Gryparis, A., Paciorek, C. J., Zeka, A., Schwartz, J. and Coull, B. A. (2009). Measurement error caused by spatial misalignment in environmental epidemiology. Biostatistics 10 258-274.
[20] Haario, H., Saksman, E. and Tamminen, J. (2001). An adaptive Metropolis algorithm. Bernoulli 7 223-242. · Zbl 0989.65004
[21] Janssen, N. A. H., Hoek, G., Simic-Lawson, S., Fischer, P., van Bree, L., ten Brink, H., Keuken, M., Atkinson, R. W., Anderson, H. R., Brunekreef, B. and Casee, F. R. (2011). Black carbon as an additional indicator of the adverse health effects of airborne particles compared with PM10 and PM2.5. Environ. Health Perspect. 119 1691-1699.
[22] Opsomer, J., Wang, Y. and Yang, Y. (2001). Nonparametric regression with correlated errors. Statist. Sci. 16 134-153. · Zbl 1059.62537
[23] Robert, C. P. and Casella, G. (1999). Monte Carlo Statistical Methods . Springer, New York. · Zbl 0935.62005
[24] Rue, H. and Held, L. (2005). Gaussian Markov Random Fields : Theory and Applications. Monographs on Statistics and Applied Probability 104 . Chapman & Hall, Boca Raton, FL. · Zbl 1093.60003
[25] Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 319-392. · Zbl 1248.62156
[26] Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression . Cambridge Univ. Press, Cambridge. · Zbl 1038.62042
[27] Tierney, L. and Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities. J. Amer. Statist. Assoc. 81 82-86. · Zbl 0587.62067
[28] van Dyk, D. A. and Park, T. (2008). Partially collapsed Gibbs samplers: Theory and methods. J. Amer. Statist. Assoc. 103 790-796. · Zbl 1471.62198
[29] Wang, Y. (1998). Smoothing spline models with correlated random errors. J. Amer. Statist. Assoc. 93 341-348. · Zbl 1068.62512
[30] Wannemuehler, K. A., Lyles, R. H., Waller, L. A., Hoekstra, R. M., Klein, M. and Tolbert, P. (2009). A conditional expectation approach for associating ambient air pollutant exposures with health outcomes. Environmetrics 20 877-894.
[31] Wood, S. N. (2006). Generalized Additive Models : An Introduction with \(R\) . Chapman & Hall, Boca Raton, FL. · Zbl 1087.62082
[32] Zanobetti, A., Coull, B. A., Gryparis, A., Sparrow, D., Vokonas, P. S., Wright, R. O., Gold, D. R. and Schwartz, J. (2014). Associations between arrhythmia episodes and temporally and spatially resolved black carbon and particulate matter in elderly patients. Occup. Environ. Med. 71 201-207.
[33] Zeger, S. L., Thomas, D., Dominici, F., Samet, J. M., Schwartz, J., Dockery, D. and Cohen, A. (2000). Exposure measurement error in time-series studies of air pollution: Concepts and consequences. Occup. Environ. Med. 108 419-426.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.