Zidek, James V.; Shaddick, Gavin; Taylor, Carolyn G. Reducing estimation bias in adaptively changing monitoring networks with preferential site selection. (English) Zbl 1304.62143 Ann. Appl. Stat. 8, No. 3, 1640-1670 (2014). Summary: This paper explores the topic of preferential sampling, specifically situations where monitoring sites in environmental networks are preferentially located by the designers. This means the data arising from such networks may not accurately characterize the spatio-temporal field they intend to monitor. Approaches that have been developed to mitigate the effects of preferential sampling in various contexts are reviewed and, building on these approaches, a general framework for dealing with the effects of preferential sampling in environmental monitoring is proposed. Strategies for implementation are proposed, leading to a method for improving the accuracy of official statistics used to report trends and inform regulatory policy. An essential feature of the method is its capacity to learn the preferential selection process over time and hence to reduce bias in these statistics. Simulation studies suggest dramatic reductions in bias are possible. A case study demonstrates use of the method in assessing the levels of air pollution due to black smoke in the UK over an extended period (1970–1996). In particular, dramatic reductions in the estimates of the number of sites out of compliance are observed. Cited in 6 Documents MSC: 62P12 Applications of statistics to environmental and related topics 62D05 Sampling theory, sample surveys Keywords:preferential sampling; Horvitz-Thompson estimation; response biased sampling; space-time fields Software:EnviroStat × Cite Format Result Cite Review PDF Full Text: DOI arXiv Euclid References: [1] Ainslie, B., Reuten, C., Steyn, D. G., Le, N. D. and Zidek, J. V. (2009). Application of an entropy-based Bayesian optimization technique to the redesign of an existing monitoring network for single air pollutants. Journal of Environmental Management 90 2715-2729. [2] Binder, D. A. (1983). On the variances of asymptotically normal estimators from complex surveys. Internat. Statist. Rev. 51 279-292. · Zbl 0535.62014 · doi:10.2307/1402588 [3] Binder, D. A. and Patak, Z. (1994). Use of estimating functions for estimation from complex surveys. J. Amer. Statist. Assoc. 89 1035-1043. · Zbl 0825.62392 · doi:10.2307/2290931 [4] Chang, H., Fu, A. Q., Le, N. D. and Zidek, J. V. (2007). Designing environmental monitoring networks to measure extremes. Environ. Ecol. Stat. 14 301-321. · doi:10.1007/s10651-007-0020-5 [5] Cicchitelli, G. and Montanari, G. E. (2012). Model-assisted estimation of a spatial population mean. Internat. Statist. Rev. 80 111-126. · doi:10.1111/j.1751-5823.2011.00164.x [6] Ciocco, A. and Thompson, D. J. (1961). A follow-up of donora ten years after: Methodology and findings. Am. J. Public Health Nations Health 51 155-164. [7] Dawid, P. (2010). Discussion of “Geostatistical inference under preferential sampling” by Diggle, P. J., Menezes, R. and Su, T. J. R. Stat. Soc. Ser. C. Appl. Stat. 59 191-232. · doi:10.1111/j.1467-9876.2009.00701.x [8] Diggle, P. J., Menezes, R. and Su, T.-l. (2010b). Geostatistical inference under preferential sampling. J. R. Stat. Soc. Ser. C. Appl. Stat. 59 191-232. · doi:10.1111/j.1467-9876.2009.00701.x [9] Dockery, D. and Pope CA III (1994). Acute respiratory effects of particulate air pollution. Annu. Rev. Public Health 15 107-132. [10] EPA (2006). Air quality criteria for ozone and related photochemical oxidants. EPA/600/R-05/004aF-cF. [11] EPA (2009). National Lakes Assessment: A collaborative survey of the nation’s lakes. EPA 841-R-09-001. [12] European Commision (1980). Council directive 80/779/EEC of 15 July 1980 on air quality limit values and guide values for sulphur dioxide and suspended particulates. [13] Firket, J. (1936). Fog along the Meuse valley. Trans. Faraday Soc. 32 1191-1194. [14] Gelfand, A. E., Sahu, S. K. and Holland, D. M. (2012). On the effect of preferential sampling in spatial prediction. Environmetrics 23 565-578. · doi:10.1002/env.2169 [15] Godambe, V. P. and Thompson, M. E. (1986). Parameters of superpopulation and survey population: Their relationships and estimation. Internat. Statist. Rev. 54 127-138. · Zbl 0612.62011 · doi:10.2307/1403139 [16] Goldberg, M. S., Burnett, R. T., Bailar 3rd, J. C., Tamblyn, R., Ernst, P., Flegel, J., Brook, K., Bonvalot, Y., Singh, R., Valois, M. F. and Vincent, R. (2001). Identification of persons with cardiorespiratory conditions who are at risk of dying from the acute effects of ambient air particles. Environ. Health Perspect 109 487-494. [17] Guttorp, P. and Sampson, P. (2010). Discussion of Geostatistical inference under preferential sampling by Diggle, P. J., Menezes, R. and Su, T. J. R. Stat. Soc. Ser. C. Appl. Stat. 59 191-232. · doi:10.1111/j.1467-9876.2009.00701.x [18] Gwynn, R. C., Burnett, R. T. and Thurston, G. D. (2000). A time-series analysis of acidic particulate matter and daily mortality and morbidity in the Buffalo, New York, region. Environ. Health Perspect. 108 125-133. [19] Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663-685. · Zbl 0047.38301 · doi:10.2307/2280784 [20] Kloog, I., Nordio, F., Coull, B. A. and Schwartz, J. (2012). Incorporating local land use regression and satellite aerosol optical depth in a hybrid model of spatiotemporal PM2.5 exposures in the Mid-Atlantic states. Environmental Science & Technology 46 11913-11921. [21] Lawless, J. F., Kalbfleisch, J. D. and Wild, C. J. (1999). Semiparametric methods for response-selective and missing data problems in regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 413-438. · Zbl 0915.62030 · doi:10.1111/1467-9868.00185 [22] Le, N. D. and Zidek, J. V. (2006). Statistical Analysis of Environmental Space-Time Processes . Springer, New York. · Zbl 1102.62126 [23] Lee, A. and Hirose, Y. (2010). Semi-parametric efficiency bounds for regression models under response-selective sampling: The profile likelihood approach. Ann. Inst. Statist. Math. 62 1023-1052. · Zbl 1432.62046 · doi:10.1007/s10463-008-0205-1 [24] Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13-22. · Zbl 0595.62110 · doi:10.1093/biomet/73.1.13 [25] Monk, P. and Munro, L. J. (2010). Maths for Chemistry : A Chemists Toolkit of Calculations , 2nd ed. Oxford Univ. Press, Oxford. · Zbl 1214.00004 [26] Mueller, S. F. (1994). Characterization of ambient ozone levels in the Great Smoky Mountains National Park. Journal of Applied Meteorology 33 465-472. [27] Muir, D. and Laxen, D. P. H. (1995). Black smoke as a surrogate for PM\(_{10}\) in health studies? Atmospheric Environment 29 959-962. [28] Ott, W. (1990). A physical explanation of the lognormality of pollutant concentrations. Journal of the Air & Waste Management Association 40 1378-1383. [29] Pati, D., Reich, B. J. and Dunson, D. B. (2011). Bayesian geostatistical modelling with informative sampling locations. Biometrika 98 35-48. · Zbl 1214.62029 · doi:10.1093/biomet/asq067 [30] Pfeffermann, D. (1993). The role of sampling weights when modeling survey data. International Statistical Review/Revue Internationale de Statistique 317-337. · Zbl 0779.62009 · doi:10.2307/1403631 [31] Rao, J. N. K., Scott, A. J. and Skinner, C. J. (1998). Quasi-score tests with survey data. Statist. Sinica 8 1059-1070. · Zbl 0914.62004 [32] Samet, J. M., Dominici, F., Curriero, F. C., Coursac, I. and Zeger, S. L. (2000). Fine particulate air pollution and mortality in 20 U.S. cities, 1987-1994. N. Engl. J. Med. 343 1742-1749. [33] Särndal, C.-E., Swensson, B. and Wretman, J. (2003). Model Assisted Survey Sampling . Springer, New York. · Zbl 1027.62004 [34] Schumacher, P. and Zidek, J. V. (1993). Using prior information in designing intervention detection experiments. Ann. Statist. 21 447-463. · Zbl 0770.62002 · doi:10.1214/aos/1176349036 [35] Scott, A. J. and Wild, C. J. (2011). Fitting regression models with response-biased samples. Canad. J. Statist. 39 519-536. · Zbl 1234.62043 · doi:10.1002/cjs.10114 [36] Shaddick, G. and Zidek, J. V. (2014). A case study in preferential sampling: Long term monitoring of air pollution. Spatial Statistics 9 51-65. [37] Stehman, S. V. and Overton, W. S. (1994). Comparison of variance estimators of the Horvitz-Thompson estimator for randomized variable probability systematic sampling. J. Amer. Statist. Assoc. 89 30-43. · Zbl 0800.62053 · doi:10.2307/2291198 [38] Verhoeff, A. P., Hoek, G., Schwartz, J. H. and van Wijnen, J. (1996). Air pollution and daily mortality in Amsterdam. Epidemiology 7 225-230. [39] Zidek, J. V. and Shaddick, G. (2012). Unbiasing estimates from preferentially sampled spatial data. Technical Report 268. Univ. British Columbia, Vancouver, BC. [40] Zidek, J. V., Sun, W. and Le, N. D. (2000). Designing and integrating composite networks for monitoring multivariate Gaussian pollution fields. J. R. Stat. Soc. Ser. C. Appl. Stat. 49 63-79. · Zbl 0974.62109 · doi:10.1111/1467-9876.00179 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.