Accounting for spatial correlation in the scan statistic. (English) Zbl 1126.62107

Summary: The spatial scan statistic is widely used in epidemiology and medical studies as a tool to identify hotspots of diseases. The classical spatial scan statistic assumes the number of disease cases in different locations have independent Poisson distributions, while in practice the data may exhibit overdispersion and spatial correlation.
We examine the behavior of the spatial scan statistic when overdispersion and spatial correlation are present, and propose a modified spatial scan statistic to account for that. Some theoretical results are provided to demonstrate that ignoring the overdispersion and spatial correlation leads to an increased rate of false positives, which is verified through a simulation study. Simulation studies also show that our modified procedure can substantially reduce the rate of false alarms. Two data examples involving brain cancer cases in New Mexico and chickenpox incidence data in France are used to illustrate the practical relevance of the modified procedure.


62P10 Applications of statistics to biology and medical sciences; meta analysis
65C60 Computational problems in statistics (MSC2010)
62M30 Inference from spatial processes
62J15 Paired and multiple comparisons; multiple testing


SaTScan; geoRglm
Full Text: DOI arXiv Euclid


[1] Abramowitz, M. and Stegun, I. (1965). Handbook of Mathematical Functions , 9th ed. Dover, New York. · Zbl 0171.38503
[2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. · Zbl 0809.62014
[3] Boussard, E., Flahault, A., Vibert, J.-F. and Valleron, A.-J. (1996). Sentiweb: French communicable disease surveillance on the world wide web. British Medical J. 313 1381-1382.
[4] Breslow, N. (1984). Extra-Poisson variation in log-linear models. Appl. Statist. 33 38-44.
[5] Christensen, O. F. and Ribeiro Jr., P. J. (2002). geoRglm–a package for generalised linear spatial models. R News 2 26-28.
[6] Christensen, O. F., Roberts, G. O. and Sköld, M. (2006). Robust Markov chain Monte Carlo methods for spatial generalized linear mixed models. J. Comput. Graph. Statist. 15 1-17. · doi:10.1198/106186006X100470
[7] Costagliola, D., Flahault, A., Galinec, D., Garnerin, P., Menares, J. and Valleron, A.-J. (1991). A routine tool for detection and assessment of epidemics of influenza-like syndromes in France. Amer. J. Public Health 81 97-99.
[8] Cox, D. R. (1983). Some remarks on overdispersion. Biometrika 70 269-274. · Zbl 0511.62007 · doi:10.1093/biomet/70.1.269
[9] Deguen, S., Chau, N. P. and Flahault, A. (1998). Epidemiology of chickenpox in France (1991-1995). J. Epidemiology and Community Health Supplement 52 46S-49S.
[10] Deguen, S., Thomas, G. and Chau, N. P. (2004). Estimation of the contact rate in a seasonal SEIR model: Application to chickenpox incidence in France. Statistics in Medicine 19 1207-1216.
[11] Diggle, P. J., Tawn, J. A. and Moyeed, R. A. (1998). Model-based geostatistics (with discussion). J. Roy. Statist. Soc. Ser. C Appl. Statist. 47 299-350. · Zbl 0904.62119 · doi:10.1111/1467-9876.00113
[12] Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96-104. · Zbl 1089.62502 · doi:10.1198/016214504000000089
[13] Efron, B. (2007). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 102 93-103. · Zbl 1284.62340 · doi:10.1198/016214506000001211
[14] Gelman, A. (1996). Bayesian model-building by pure thought: Some principles and examples. Statist. Sinica 6 215-232. · Zbl 0850.62299
[15] Hjalmars, U., Kulldorff, M., Gustafsson, G. and Nagarwalla, N. (1996). Childhood leukemia in Sweden: Using GIS and a spatial scan statistic for cluster detection. Statistics in Medicine 15 707-715.
[16] Kass, R. E. and Raftery, A. E. (1995). Bayes factors and model uncertainty. J. Amer. Statist. Assoc. 90 773-795. · Zbl 0846.62028 · doi:10.2307/2291091
[17] Kulldorff, M. (1997). A spatial scan statistic. Comm. Statist. A–Theory and Methods 26 1481-1496. · Zbl 0920.62116 · doi:10.1080/03610929708831995
[18] Kulldorff, M., Athas, W. F., Feuer, E. J., Miller, B. A. and Key, C. R. (1998a). Evaluating cluster alarms: A space-time scan statistic and brain cancer in Los Alamos, New Mexico. Amer. J. Public Health 88 1377-1380.
[19] Kulldorff, M., Rand, K., Gherman, G., Williams, G. and DeFrancesco, D. (1998b). SaTScan v2.1 : Software for the Spatial and Space-Time Scan Statistics . National Cancer Institute, Bethesda, MD.
[20] Kulldorff, M., Tango, T. and Park, P. (2003). Power comparisons for disease clustering tests. Comput. Statist. Data Anal. 42 665-684. · Zbl 1429.62558
[21] Lawless, J. F. (1987). Negative binomial and mixed Poisson regression. Canad. J. Statist. 15 209-225. · Zbl 0632.62060 · doi:10.2307/3314912
[22] Lawson, A. and Denison, D., eds. (2002). Spatial Cluster Modelling . Chapman and Hall/CRC, Boca Raton, FL. · Zbl 1046.62102
[23] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models , 2nd ed. Chapman and Hall, London. · Zbl 0744.62098
[24] Møller, J., Syversveen, A. R. and Waagepetersen, R. P. (1998). Log Gaussian Cox processes. Scand. J. Statist. 25 451-482. · Zbl 0931.60038 · doi:10.1111/1467-9469.00115
[25] Naus, J. I. (1965). The distribution of the size of the maximum cluster of points on the line. J. Amer. Statist. Assoc. 60 532-538. · doi:10.1080/01621459.1965.10480810
[26] Perez, A. M., Ward, M. P., Torres, P. and Ritacco, V. (2002). Use of spatial statistics and monitoring data to identify clustering of bovine tuberculosis in Argentina. Preventive Veterinary Medicine 56 63-74.
[27] Rice, J. A. (1995). Mathematical Statistics and Data Analysis , 2nd ed. Wadsworth, California. · Zbl 0868.62006
[28] Sankoh, O. A., Yé, Y., Sauerborn, R., Müller, O. and Becher, H. (2001). Clustering of childhood mortality in rural Burkina Faso. Internat. J. Epidemiology 30 485-492.
[29] Shaked, M. (1980). On mixtures from exponential families. J. Roy. Statist. Soc. Ser. B 42 192-198. · Zbl 0443.62009
[30] Tsay, R. S. (1986). Time series model specification in the presence of outliers. J. Amer. Statist. Assoc. 81 132-141.
[31] Viboud, C., Boëlle, P.-Y., Pakdaman, K., Carrat, F., Valleron, A.-J. and Flahault, A. (2004). Influenza epidemics in the United States, France, and Australia, 1972-1997. Emerging Infectious Diseases 10 32-39.
[32] Viel, J. F., Arveux, P., Baverel, J. and Cahn, J. Y. (2000). Soft-tissue sarcoma and non-Hodgkin’s lymphoma clusters around a municipal solid waste incinerator with high dioxin emission levels. Amer. J. Epidemiology 151 13-19.
[33] Wikle, C. (2002). Spatial modeling of count data: A case study in modelling breeding bird survey data on large spatial domains. In Spatial Cluster Modelling (A. Lawson and D. Denison, eds.) 199-209. Chapman and Hall/CRC, Boca Raton, FL.
[34] Zhang, H. (2002). On estimation and prediction for spatial generalized linear mixed models. Biometrics 58 129-136. · Zbl 1209.62161 · doi:10.1111/j.0006-341X.2002.00129.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.