×

Accounting for survey design in Bayesian disaggregation of survey-based areal estimates of proportions: an application to the American Community Survey. (English) Zbl 1498.62316

Summary: Understanding the effects of social determinants of health on health outcomes requires data on characteristics of the neighborhoods in which subjects live. However, estimates of these characteristics are often aggregated over space and time in a fashion that diminishes their utility. Take, for example, estimates from the American Community Survey (ACS), a multiyear nationwide survey administered by the U.S. Census Bureau: estimates for small municipal areas are aggregated over 5-year periods, whereas 1-year estimates are only available for municipal areas with populations \(>65,000\). Researchers may wish to use ACS estimates in studies of population health to characterize neighborhood-level exposures. However, 5-year estimates may not properly characterize temporal changes or align temporally with other data in the study, while the coarse spatial resolution of the 1-year estimates diminishes their utility in characterizing neighborhood exposure. To circumvent this issue, in this paper we propose a modeling framework to disaggregate estimates of proportions derived from sampling surveys, which explicitly accounts for the survey design effect. We illustrate the utility of our model by applying it to the ACS data, generating estimates of poverty for the state of Michigan at fine spatiotemporal resolution.

MSC:

62P25 Applications of statistics to social sciences
62D05 Sampling theory, sample surveys
62F15 Bayesian inference
62M30 Inference from spatial processes

Software:

spBayes

References:

[1] ABRAMS, E. M. and SZEFLER, S. J. (2020). COVID-19 and the impact of social determinants of health. Lancet Respir. Med. 8 659-661. · doi:10.1016/S2213-2600(20)30234-4
[2] AGUILAR, L. (2015). Detroit’s Cass Corridor makes way for new era. The Detroit News, published April 2015.
[3] Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669-679. · Zbl 0774.62031
[4] Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2004). Hierarchical Modeling and Analysis for Spatial Data. Chapman & Hall/CRC, Boca Raton, FL. · Zbl 1053.62105
[5] BENEDETTI, M. H., BERROCAL, V. J. and LITTLE, R. J. (2022). Supplement to “Accounting for survey design in Bayesian disaggregation of survey-based areal estimates of proportions: an application to the American Community Survey.” https://doi.org/10.1214/21-AOAS1585SUPPA, https://doi.org/10.1214/21-AOAS1585SUPPB
[6] BERROCAL, V. J., GELFAND, A. E. and HOLLAND, D. M. (2010). A bivariate space-time downscaler under space and time misalignment. Ann. Appl. Stat. 4 1942-1975. · Zbl 1220.62148 · doi:10.1214/10-AOAS351
[7] BRADLEY, J. R., HOLAN, S. H. and WIKLE, C. K. (2016). Multivariate spatio-temporal survey fusion with application to the American Community Survey and Local Area Unemployment Statistics. Stat 5 224-233. · Zbl 07848564 · doi:10.1002/sta4.120
[8] BRADLEY, J. R., WIKLE, C. K. and HOLAN, S. H. (2015). Spatio-temporal change of support with application to American Community Survey multi-year period estimates. Stat 4 255-270. · Zbl 07847948 · doi:10.1002/sta4.94
[9] BRADLEY, J. R., WIKLE, C. K. and HOLAN, S. H. (2016). Bayesian spatial change of support for count-valued survey data with application to the American community survey. J. Amer. Statist. Assoc. 111 472-487. · doi:10.1080/01621459.2015.1117471
[10] BRAVERMAN, P., EGERTER, S. and WILLIAMS, D. R. (2011). The social determinants of health: Coming of age. Annual Reviews of Public Health 32 381-398.
[11] CHEN, C., WAKEFIELD, J. and LUMLEY, T. (2014). The use of sampling weights in Bayesian hierarchical models for small area estimation. Spat. Spatio-Tempor. Epidemiol. 11 33-43.
[12] DAWID, A. P. (1982). The well-calibrated Bayesian. J. Amer. Statist. Assoc. 77 605-613. · Zbl 0495.62005 · doi:10.1080/01621459.1982.10477856
[13] DIGGLE, P. J., TAWN, J. A. and MOYEED, R. A. (1998). Model-based geostatistics. J. R. Stat. Soc. Ser. C. Appl. Stat. 47 299-350. · Zbl 0904.62119 · doi:10.1111/1467-9876.00113
[14] DURANTE, D. (2019). Conjugate Bayes for probit regression via unified skew-normal distributions. Biometrika 106 765-779. · Zbl 1435.62107 · doi:10.1093/biomet/asz034
[15] ENTEZARI, R., BROWN, P. E. and ROSENTHAL, J. S. (2019). Bayesian spatial analysis of hardwood tree counts via MCMC. Environmetrics 31 e2608.
[16] Fay, R. E. III and Herriot, R. A. (1979). Estimates of income for small places: An application of James-Stein procedures to census data. J. Amer. Statist. Assoc. 74 269-277.
[17] FINLEY, A., BANERJEE, S. and CARLIN, B. P. (2007). spBayes: An R package for univariate and multivariate hierarchical point-referenced spatial models. J. Stat. Softw. 19 1-24.
[18] GELFAND, A. E., BANERJEE, S. and GAMERMAN, D. (2005). Spatial process modelling for univariate and multivariate dynamic spatial data. Environmetrics 16 465-479. · doi:10.1002/env.715
[19] GELFAND, A. E., ZHU, L. and CARLIN, B. P. (2001). On the change of support problem for spatio-temporal data. Biostatistics 2 31-45. · Zbl 1022.62095
[20] GEWEKE, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In Bayesian Statistics, 4 (Peñíscola, 1991) J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 169-193. Oxford Univ. Press, New York.
[21] GHITZA, Y. and GELMAN, A. (2013). Deep interaction with MRP: Election turnout and voting patterns among small electoral subgroups. Amer. J. Polit. Sci. 57 762-776.
[22] GILANI, O., BERROCAL, V. J. and BATTERMAN, S. A. (2019). Nonstationary spatiotemporal Bayesian data fusion for pollutants in the near-road environment. Environmetrics 30 e2581. · doi:10.1002/env.2581
[23] GOTWAY, C. A. and YOUNG, L. J. (2002). Combining incompatible spatial data. J. Amer. Statist. Assoc. 97 632-648. · Zbl 1073.62604 · doi:10.1198/016214502760047140
[24] HEATON, M. J., DATTA, A., FINLEY, A. O., FURRER, R., GUINNESS, J., GUHANIYOGI, R., GERBER, F., GRAMACY, R. B., HAMMERLING, D. et al. (2019). A case study competition among methods for analyzing large spatial data. J. Agric. Biol. Environ. Stat. 24 398-425. · Zbl 1426.62345
[25] IACHAN, R., PIERANNUNZI, C., HEALEY, K., GREENLUND, K. and TOWN, M. (2016). National weighting of data from the behavioral risk factor surveillance system (BRFSS). BMC Med. Res. Methodol. 16 1-12.
[26] Katzfuss, M. (2017). A multi-resolution approximation for massive spatial datasets. J. Amer. Statist. Assoc. 112 201-214. · doi:10.1080/01621459.2015.1123632
[27] KISH, L. (1965). Survey Sampling. Wiley, New York. · Zbl 0151.23403
[28] KISH, L. (1995). Methods for design effects. J. Off. Stat. 11 55-77.
[29] KORN, E. L. and GRAUBARD, B. I. (1998). Confidence intervals for proportions with small expected number of positive counts estimated from survey data. Surv. Methodol. 24 193-201.
[30] LI, Z., HSIAO, Y., GODWIN, J., MARTIN, B. D., WAKEFIELD, J. and CLARK, S. J. (2019). Changes in the spatial distribution of the under-five mortality rate: Small-area analysis of 122 DHS surveys in 262 subregions of 35 countries in Africa. PLoS ONE 14 1-17.
[31] LITTLE, R. J. (2006). Calibrated Bayes: A Bayes/frequentist roadmap. Amer. Statist. 60 213-223. · doi:10.1198/000313006X117837
[32] MARMOT, M., ALLEN, J., BELL, R., BLOOMER, E., GOLDBLATT, P., CONSORTIUM FOR THE EUROPEAN REVIEW OF SOCIAL DETERMINANTS OF HEALTH AND THE HEALTH DIVIDE (2012). WHO European review of social determinants of health and the health divide. Lancet 380 1011-1029.
[33] MERCER, L., WAKEFIELD, J., CHEN, C. and LUMLEY, T. (2014). A comparison of spatial smoothing methods for small area estimation with sampling weights. Spat. Stat. 8 69-85. · doi:10.1016/j.spasta.2013.12.001
[34] MITCHELL, M. W., GENTON, M. G. and GUMPERTZ, M. L. (2005). Testing for separability of space-time covariances. Environmetrics 16 819-831. · doi:10.1002/env.737
[35] MOEHLMAN, L. and ROBINS-SOMERVILLE, M. (2016). The new Detroit: How gentrification has changed Detroit’s economic landscape. Michigan Daily, published September 2016.
[36] PEREIRA, L. N. and COELHO, P. S. (2010). Small area estimation of mean price of habitation transaction using time-series and cross-sectional area-level models. J. Appl. Stat. 37 651-666. · Zbl 1511.62172 · doi:10.1080/02664760902810821
[37] Pfeffermann, D. (2013). New important developments in small area estimation. Statist. Sci. 28 40-68. · Zbl 1332.62038 · doi:10.1214/12-STS395
[38] Polson, N. G., Scott, J. G. and Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. J. Amer. Statist. Assoc. 108 1339-1349. · Zbl 1283.62055 · doi:10.1080/01621459.2013.829001
[39] PORTER, A. T., HOLAN, S. H., WIKLE, C. K. and CRESSIE, N. (2014). Spatial Fay-Herriot models for small area estimation with functional covariates. Spat. Stat. 10 27-42. · doi:10.1016/j.spasta.2014.07.001
[40] PRATESI, M. and SALVATI, N. (2008). Small area estimation: The EBLUP estimator based on spatially correlated random area effects. Stat. Methods Appl. 17 113-141. · Zbl 1367.62226 · doi:10.1007/s10260-007-0061-9
[41] Roberts, G. O., Gelman, A. and Gilks, W. R. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 7 110-120. · Zbl 0876.60015 · doi:10.1214/aoap/1034625254
[42] ROLLSTON, R. and GALEA, S. (2020). COVID-19 and the social determinants of health. Am. J. Health Promot. 34 687-689. · doi:10.1177/0890117120930536b
[43] SAVITSKY, T. D. (2016). Bayesian nonparametric multiresolution estimation for the American Community Survey. Ann. Appl. Stat. 10 2157-2181. · Zbl 1454.62052 · doi:10.1214/16-AOAS968
[44] Simpson, D., Rue, H., Riebler, A., Martins, T. G. and SØrbye, S. H. (2017). Penalising model component complexity: A principled, practical approach to constructing priors. Statist. Sci. 32 1-28. · Zbl 1442.62060 · doi:10.1214/16-STS576
[45] SIMPSON, M., HOLAN, S. H., WIKLE, C. K. and BRADLEY, J. R. (2019). Interpolating distributions for populations in nested geographies using public-use data with application to the American Community Survey. Preprint. Available at arXiv:1802.02626.
[46] SINGH, B., SHUKLA, G. and KUNDU, D. (2005). Spatio-temporal models in small-area estimation. Surv. Methodol. 31 183-195.
[47] SINGU, S., ACHARYA, A., CHALLAGUNDLA, K. and BYAREDDY, S. B. (2020). Impact of social determinants of health on the emerging COVID-19 pandemic in the United States. Frontiers in Public Health 8 406.
[48] SØRBYE, S. H. and RUE, H. (2011). Simultaneous credible bands for latent Gaussian models. Scand. J. Stat. 38 712-725. · Zbl 1246.62067 · doi:10.1111/j.1467-9469.2011.00741.x
[49] U.S. CENSUS BUREAU (2008). A Compass for Understanding and Using American Community Survey Data: What General Data Users Need to Know. U.S. Government Printing Office, Washington, DC.
[50] U.S. CENSUS BUREAU (2014). American Community Survey Design and Methodology. U.S. Government Printing Office, Washington, DC
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.