zbMATH — the first resource for mathematics

Compression and conditional emulation of climate model output. (English) Zbl 1398.68157
Summary: Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus it is important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. We decompress the data by computing conditional expectations and conditional simulations from the model given the summary statistics. Conditional expectations represent our best estimate of the original data but are subject to oversmoothing in space and time. Conditional simulations introduce realistic small-scale noise so that the decompressed fields are neither too smooth nor too rough compared with the original data. Considerable attention is paid to accurately modeling the original dataset – 1 year of daily mean temperature data – particularly with regard to the inherent spatial nonstationarity in global fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers.

68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)
62P35 Applications of statistics to physics
62P99 Applications of statistics
68U20 Simulation (MSC2010)
FFTW; CSparse
Full Text: DOI
[1] Amestoy, P. R.; Davis, T. A.; Duff, I. S., An approximate minimum degree ordering algorithm, SIAM Journal on Matrix Analysis and Applications, 17, 886-905, (1996) · Zbl 0861.65021
[2] Baker, A.; Xu, H.; Dennis, J.; Levy, M.; Nychka, D.; Mickelson, S.; Edwards, J.; Vertenstein, M.; Wegener, A., Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, A methodology for evaluating the impact of data compression on climate simulation data, 203-214, (2014)
[3] Bicer, T.; Yin, J.; Chiu, D.; Agrawal, G.; Schuchardt, K., Parallel and Distributed Processing Symposium, International, Integrating online compression to accelerate large-scale data analytics applications, 1205-1216, (2013)
[4] Brockwell, P. J.; Davis, R. A., Time Series: Theory and Methods, (2006), Springer Science & Business Media, New York
[5] Candes, E. J.; Romberg, J. K.; Tao, T., Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathematics, 59, 1207-1223, (2006) · Zbl 1098.94009
[6] Castruccio, S.; Genton, M. G., Compressing an ensemble with statistical models: an algorithm for global 3D spatio-temporal temperature, Technometrics, 58, 319-328, (2016)
[7] Castruccio, S.; Guinness, J., An evolutionary spectrum approach to incorporate large-scale geographical descriptors on global processes, Journal of the Royal Statistical Society, 66, 329-344, (2017)
[8] Castruccio, S.; McInerney, D. J.; Stein, M. L.; Crouch, F.; Jacob, R. L.; Moyer, E. J., Statistical emulation of climate model projections based on precomputed GCM runs, Journal of Climate, 27, 1829-1844, (2014)
[9] Castruccio, S.; Stein, M. L., Global space–time models for climate ensembles, The Annals of Applied Statistics, 7, 1593-1611, (2013) · Zbl 1454.62436
[10] Davis, T. A., Direct Methods for Sparse Linear Systems, 2, (2006), Philadelphia, PA: SIAM · Zbl 1119.65021
[11] Donoho, D. L., Compressed sensing, IEEE Transactions on Information Theory, 52, 1289-1306, (2006) · Zbl 1288.94016
[12] Ennis, D., Spherical harmonics, (2005)
[13] Frigo, M.; Johnson, S. G., The design and implementation of FFTW3,, Proceedings of the IEEE, 93, 216-231, (2005)
[14] Gneiting, T.; Raftery, A. E., Strictly proper scoring rules, prediction, and estimation, Journal of the American Statistical Association, 102, 359-378, (2007) · Zbl 1284.62093
[15] Guinness, J.; Fuentes, M., Likelihood approximations for big nonstationary spatial temporal lattice data, Statistica Sinica, 25, 329-349, (2015) · Zbl 06497349
[16] Guinness, J.; Fuentes, M., Isotropic covariance functions on spheres: some properties and modeling considerations, Journal of Multivariate Analysis, 143, 143-152, (2016) · Zbl 1328.62543
[17] Guinness, J.; Stein, M. L., Interpolation of nonstationary high frequency spatial–temporal temperature data, The Annals of Applied Statistics, 7, 1684-1708, (2013) · Zbl 1454.62444
[18] Holden, P. B.; Edwards, N. R.; Garthwaite, P. H.; Wilkinson, R. D., Emulation and interpretation of high-dimensional climate model outputs, Journal of Applied Statistics, 42, 2038-2055, (2015)
[19] Horrell, M. T.; Stein, M. L., Half-spectral space-time covariance models,, Spatial Statistics, 19, 90-100, (2017)
[20] Jun, M., Non-stationary cross-covariance models for multivariate processes on a globe, Scandinavian Journal of Statistics, 38, 726-747, (2011) · Zbl 1246.91113
[21] Katzfuss, M.; Cressie, N., Bayesian hierarchical spatio-temporal smoothing for very large datasets, Environmetrics, 23, 94-107, (2012)
[22] Kay, J.; Deser, C.; Phillips, A.; Mai, A.; Hannay, C.; Strand, G.; Arblaster, J.; Bates, S.; Danabasoglu, G.; Edwards, J.; Holland, M.; Kushner, P.; Lamarque, J.-F.; Lawrence, D.; Lindsay, K.; Middleton, A.; Munoz, E.; Neale, R.; Oleson, K.; Polvani, L.; Vertenstein, M., The community Earth system model (CESM) large ensemble project: A community resource for studying climate change in the presence of internal climate variability, Bulletin of the American Meteorological Society, 1333-1349, (2015)
[23] Kunkel, J. M.; Kuhn, M.; Ludwig, T., Exascale storage systems–an analytical study of expenses, Supercomputing Frontiers and Innovations, 116-134, (2014)
[24] Lindgren, F.; Rue, H.; Lindström, J., An explicit link between gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach, Journal of the Royal Statistical Society, 73, 423-498, (2011) · Zbl 1274.62360
[25] Lindstrom, P.; Isenburg, M., Fast and efficient compression of floating-point data, IEEE Transactions on Visualization and Computer Graphics, 12, 1245-1250, (2006)
[26] Paul, K.; Mickelson, S.; Xu, H.; Dennis, J. M.; Brown, D., IEEE International Conference on Big Data, Light-weight parallel python tools for Earth system modeling workflows, 1985-1994, (2015)
[27] Sayood, K., introduction to data compression, (2012), San Francisco, CA: Newnes
[28] Stein, M. L., Statistical methods for regular monitoring data, Journal of the Royal Statistical Society, 67, 667-687, (2005) · Zbl 1101.62115
[29] Stein, M. L., Spatial interpolation of high-frequency monitoring data, The Annals of Applied Statistics, 3, 272-291, (2009) · Zbl 1160.62094
[30] Tran, G. T.; Oliver, K. I.; Toal, D. J.; Holden, P. B.; Edwards, N. R., Building a traceable climate model hierarchy with multi-level emulators, Advances in Statistical Climatology, Meteorology and Oceanography, 2, 17, (2016)
[31] Vecchia, A. V., Estimation and model identification for continuous spatial processes, Journal of the Royal Statistical Society, 50, 297-312, (1988)
[32] Whittle, P., The analysis of multiple stationary time series, Journal of the Royal Statistical Society, 15, 125-139, (1953) · Zbl 0053.41002
[33] Williamson, D.; Blaker, A. T., Evolving Bayesian emulators for structured chaotic time series, with application to large climate models, SIAM/ASA Journal on Uncertainty Quantification, 2, 1-28, (2014) · Zbl 1349.62433
[34] Yadrenko, M. I., spectral theory of random fields, (1983), New York: Optimization Software · Zbl 0539.60048
[35] Zhang, B.; Sang, H.; Huang, J. Z., Full-scale approximations of spatio-temporal covariance models for large datasets, Statistica Sinica, 25, 99-114, (2015) · Zbl 06497336
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.