×

Dynamic model-based clustering for spatio-temporal data. (English) Zbl 1384.62201

Summary: In many research fields, scientific questions are investigated by analyzing data collected over space and time, usually at fixed spatial locations and time steps and resulting in geo-referenced time series. In this context, it is of interest to identify potential partitions of the space and study their evolution over time. A finite space-time mixture model is proposed to identify level-based clusters in spatio-temporal data and study their temporal evolution along the time frame. We anticipate space-time dependence by introducing spatio-temporally varying mixing weights to allocate observations at nearby locations and consecutive time points with similar cluster’s membership probabilities. As a result, a clustering varying over time and space is accomplished. Conditionally on the cluster’s membership, a state-space model is deployed to describe the temporal evolution of the sites belonging to each group. Fully posterior inference is provided under a Bayesian framework through Monte Carlo Markov chain algorithms. Also, a strategy to select the suitable number of clusters based upon the posterior temporal patterns of the clusters is offered. We evaluate our approach through simulation experiments, and we illustrate using air quality data collected across Europe from 2001 to 2012, showing the benefit of borrowing strength of information across space and time.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62M30 Inference from spatial processes
62F15 Bayesian inference
62P12 Applications of statistics to environmental and related topics

Software:

BayesLogit; spBayes
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Banerjee, S., Carlin, B.P., Gelfand, A.E.: Hierarchical Modeling and Analysis for Spatial Data, 2nd edn. Chapman and Hall, Boca Raton (2014) · Zbl 1358.62009
[2] Bruno, F., Cocchi, D., Paci, L.: A practical approach for assessing the effect of grouping in hierarchical spatio-temporal models. AStA Adv. Stat. Anal. 97(2), 93-108 (2013) · Zbl 1443.62400 · doi:10.1007/s10182-012-0193-6
[3] Carlin, B.P., Polson, N.G., Stoffer, D.S.: A Monte Carlo approach to nonnormal and nonlinear state-space modeling. J. Am. Stat. Assoc. 87(418), 493-500 (1992) · doi:10.1080/01621459.1992.10475231
[4] Celeux, G., Forbes, F., Robert, C.P., Titterington, D.M.: Deviance information criteria for missing data models. Bayesian Anal. 1(4), 651-673 (2006) · Zbl 1331.62329 · doi:10.1214/06-BA122
[5] Cocchi, D., Greco, F., Trivisano, C.: Hierarchical space-time modelling of PM10 pollution. Atmos Environ 41(3), 532-542 (2007) · doi:10.1016/j.atmosenv.2006.08.032
[6] Cressie, N., Wikle, C.K.: Statistics for Spatio-Temporal Data. Wiley, Hoboken (2011) · Zbl 1273.62017
[7] Dellaportas, P., Papageorgiou, I.: Multivariate mixtures of normals with unknown number of components. Stat. Comput. 16(1), 57-68 (2006) · doi:10.1007/s11222-006-5338-6
[8] Duan, J.A., Guindani, M., Gelfand, A.E.: Generalized spatial Dirichlet process models. Biometrika 94, 809-825 (2007) · Zbl 1156.62064 · doi:10.1093/biomet/asm071
[9] EU: Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe. Off. J. Eur. Union L 152:1-44 (2008). http://eur-lex.europa.eu/eli/dir/2008/50/oj · Zbl 1327.62473
[10] EU: Commission implementing decision 2011/850/EU of 12 December 2011 laying down rules for directives 2004/107/EC and 2008/50/EC of the European Parliament and of the Council as regards the reciprocal exchange of information and reporting on ambient air quality. Off. J. Eur. Union L 335:86-106 (2011). http://data.europa.eu/eli/dec_impl/2011/850/oj · Zbl 1359.62401
[11] Fernández, C., Green, P.J.: Modelling spatially correlated data via mixtures: a Bayesian approach. J. R. Stat. Soc. Ser. B 64, 805-826 (2002) · Zbl 1067.62029 · doi:10.1111/1467-9868.00362
[12] Finazzi, F., Haggarty, R., Miller, C., Scott, M., Fassò, A.: A comparison of clustering approaches for the study of the temporal coherence of multiple time series. Stoch. Environ. Res. Risk Assess. 29, 463-475 (2015) · doi:10.1007/s00477-014-0931-2
[13] Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006) · Zbl 1108.62002
[14] Frühwirth-Schnatter, S., Kaufmann, S.: Model-based clustering of multiple time series. J. Bus. Econ. Stat. 26, 78-89 (2008) · doi:10.1198/073500107000000106
[15] Gelfan, A.E., Ghosh, S.K.: Model choice: a minimum posterior predictive loss approach. Biometrika 85(1), 1-11 (1998) · Zbl 0904.62036 · doi:10.1093/biomet/85.1.1
[16] Gelfand, A.E., Kottas, A., MacEachern, S.N.: Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Am. Stat. Assoc. 100(471), 1021-1035 (2005) · Zbl 1117.62342 · doi:10.1198/016214504000002078
[17] Guerreiro, C.B., Foltescu, V., de Leeuw, F.: Air quality status and trends in Europe. Atmos. Environ. 98, 376-384 (2014) · doi:10.1016/j.atmosenv.2014.09.017
[18] Hennig, C.: Methods for merging gaussian mixture components. Adv. Data Anal. Classif. 4(1), 3-34 (2010) · Zbl 1306.62141 · doi:10.1007/s11634-010-0058-3
[19] Hossain, M.M., Lawson, A.B., Cai, B., Choi, J., Liu, J., Kirby, R.S.: Space-time areal mixture model: relabeling algorithm and model selection issues. Environmetrics 25, 84-96 (2014) · Zbl 1525.62141 · doi:10.1002/env.2265
[20] Inoue, L.Y.T., Neira, M., Nelson, C., Gleave, M., Etzioni, R.: Cluster-based network model for time-course gene expression data. Biostatistics 8, 507-525 (2007) · Zbl 1118.62116 · doi:10.1093/biostatistics/kxl026
[21] Jasra, A., Holmes, C.C., Stephens, D.A.: Markov chain monte carlo methods and the label switching problem in Bayesian mixture modeling. Stat. Sci. 20(1), 50-67 (2005) · Zbl 1100.62032
[22] Knorr-Held, L.: Conditional prior proposals in dynamic models. Scand. J. Stat. 26(1), 129-144 (1999) · Zbl 0924.65152 · doi:10.1111/1467-9469.00141
[23] Lau, J.W., Green, P.J.: Bayesian model-based clustering procedures. J. Comput. Gr. Stat. 16(3), 526-558 (2007)
[24] Malsiner-Walli, G., Frühwirth-Schnatter, S., Grün, B.: Model-based clustering based on sparse finite gaussian mixtures. Stat. Comput. 26(1), 303-324 (2016) · Zbl 1342.62109 · doi:10.1007/s11222-014-9500-2
[25] Melnykov, V.: Merging mixture components for clustering through pairwise overlap. J. Comput. Gr. Stat. 25(1), 66-90 (2016) · doi:10.1080/10618600.2014.978007
[26] Neelon, B., Gelfand, A.E., Miranda, M.L.: A multivariate spatial mixture model for areal data: examining regional differences in standardized test scores. J. R. Stat. Soc. Ser. C 63, 737-761 (2014) · doi:10.1111/rssc.12061
[27] Nguyen, X., Gelfand, A.E.: The Dirichlet labeling process for clustering function data. Stat. Sin. 21, 1249-1289 (2011) · Zbl 1223.62104 · doi:10.5705/ss.2008.285
[28] Nieto-Barajas, L.E., Contreras-Cristán, A.: A Bayesian nonparametric approach for time series clustering. Bayesian Anal. 9(1), 147-170 (2014) · Zbl 1327.62473 · doi:10.1214/13-BA852
[29] Page, G.L., Quintana, F.A.: Spatial product partition models. Bayesian Anal. 11, 265-298 (2016) · Zbl 1359.62401 · doi:10.1214/15-BA971
[30] Polson, N.G., Scott, J.G., Windle, J.: Bayesian inference for logistic models using PlyaGamma latent variables. J. Am. Stat. Assoc. 108(504), 1339-1349 (2013) · Zbl 1283.62055 · doi:10.1080/01621459.2013.829001
[31] Ranciati, S., Viroli, C., Wit, E.: Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data. ArXiv e-prints 1601, 04879 (2016) · Zbl 1379.62082
[32] Reich, B.J., Fuentes, M.: A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. Ann. Appl. Stat. 1(1), 249-264 (2007) · Zbl 1129.62114 · doi:10.1214/07-AOAS108
[33] Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. Ser. B 59(4), 731-792 (1997) · Zbl 0891.62020 · doi:10.1111/1467-9868.00095
[34] Sperrin, M., Jaki, T., Wit, E.: Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models. Stat. Comput. 20(3), 357-366 (2010) · doi:10.1007/s11222-009-9129-8
[35] Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B 64(4), 583-639 (2002) · Zbl 1067.62010 · doi:10.1111/1467-9868.00353
[36] Stephens, M.: Dealing with label switching in mixture models. J. R. Stat. Soc. Ser. B 62(4), 795-809 (2000) · Zbl 0957.62020 · doi:10.1111/1467-9868.00265
[37] Vincent, K., Stedman, J.: A review of air quality station type classifications for UK compliance monitoring. Tech. rep. The Department for Environment, Food and Rural Affairs, Welsh Government, Scottish Government and the Department of the Environment for Northern Ireland, rICARDO-AEA/R/3387 (2013). https://uk-air.defra.gov.uk/library/reports?report_id=765
[38] Viroli, C.: Model based clustering for three-way data structures. Bayesian Anal. 6(4), 573-602 (2011) · Zbl 1330.62262 · doi:10.1214/11-BA622
[39] West, M., Harrison, J.: Bayesian Forecasting and Dynamic Models, 2nd edn. Springer, New York (1997) · Zbl 0871.62026
[40] Zhang, H.: Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J. Am. Stat. Assoc. 99, 250-261 (2004) · Zbl 1089.62538 · doi:10.1198/016214504000000241
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.