Bayesian estimation under informative sampling with unattenuated dependence.

*(English)*Zbl 1437.62068Summary: An informative sampling design leads to unit inclusion probabilities that are correlated with the response variable of interest. However, multistage sampling designs may also induce higher order dependencies, which are ignored in the literature when establishing consistency of estimators for survey data under a condition requiring asymptotic independence among the unit inclusion probabilities. This paper constructs new theoretical conditions that guarantee that the pseudo-posterior, which uses sampling weights based on first order inclusion probabilities to exponentiate the likelihood, is consistent not only for survey designs which have asymptotic factorization, but also for survey designs that induce residual or unattenuated dependence among sampled units. The use of the survey-weighted pseudo-posterior, together with our relaxed requirements for the survey design, establish a wide variety of analysis models that can be applied to a broad class of survey data sets. Using the complex sampling design of the National Survey on Drug Use and Health, we demonstrate our new theoretical result on multistage designs characterized by a cluster sampling step that expresses within-cluster dependence. We explore the impact of multistage designs and order based sampling.

##### MSC:

62D05 | Sampling theory, sample surveys |

##### Keywords:

cluster sampling; stratification; survey sampling; sampling weights; Markov chain Monte Carlo##### References:

[1] | Binder, D. A. (1983). “On the variances of asymptotically normal estimators from complex surveys.” International Statistical Review, 51: 279-92. · Zbl 0535.62014 |

[2] | Breslow, N. E. and Wellner, J. A. (2007). “Weighted Likelihood for Semiparametric Models and Two-phase Stratified Samples, with Application to Cox Regression.” Scandinavian Journal of Statistics, 34(1): 86-102. · Zbl 1142.62014 |

[3] | Brewer, K. (1975). “A simple procedure for \(\pi\) pswor.” Australian Journal of Statistics, 17: 166-172. · Zbl 0398.62006 |

[4] | Carpenter, B. (2015). “Stan: A Probabilistic Programming Language.” Journal of Statistical Software, 76(1). |

[5] | Center for Behavioral Health Statistics and Quality (2015a). “Section 1: Adult Mental Health Tables.” In 2014 National Survey on Drug Use and Health: Mental Health Detailed Tables. Rockville, MD: Substance Abuse and Mental Health Services Administration. |

[6] | Center for Behavioral Health Statistics and Quality (2015b). “Section 2: Tobacco Product and Alcohol Use Tables.” In 2014 National Survey on Drug Use and Health: Detailed Tables. Rockville, MD: Substance Abuse and Mental Health Services Administration. |

[7] | Chambers, R. and Skinner, C. (2003). Analysis of Survey Data. Wiley Series in Survey Methodology. Wiley. · Zbl 1024.00035 |

[8] | Gelman, A., Hwang, J., and Vehtari, A. (2014). “Understanding predictive information criteria for Bayesian models.” Statistics and Computing, 24(6): 997-1016. · Zbl 1332.62090 |

[9] | Ghosal, S., Ghosh, J. K., and Vaart, A. W. V. D. (2000). “Convergence rates of posterior distributions.” The Annals of Statistics, 28(2): 500-531. · Zbl 1105.62315 |

[10] | Ghosal, S. and van der Vaart, A. (2007). “Convergence rates of posterior distributions for noniid observations.” The Annals of Statistics, 35(1): 192-223. · Zbl 1114.62060 |

[11] | Godambe, V. P. and Thompson, M. E. (1986). “Parameters of super populations and survey population: their relationship and estimation.” International Statistical Review, 54: 37-59. |

[12] | Heeringa, S. G., West, B. T., and Berglund, P. A. (2010). Applied Survey Data Analysis. Chapman and Hall/CRC. |

[13] | Holt, D., Smith, T. M. F., and Winter, P. D. (1980). “Regression Analysis of Data from Complex Surveys.” Journal of the Royal Statistical Society. Series A (General), 143(4): 474-487. · Zbl 0452.62052 |

[14] | Isaki, C. T. and Fuller, W. A. (1982). “Survey Design Under the Regression Superpopulation Model.” Journal of the American Statistical Association, 77: 89-96. · Zbl 0511.62016 |

[15] | Kish, L. and Frankel, M. R. (1974). “Inference from complex samples (with discussion).” Journal of the Royal Statistical Society, Series B, 36: 1-37. · Zbl 0295.62011 |

[16] | Morton, K. B., Aldworth, J., Hirsch, E. L., Martin, P. C., and Shook-Sa, B. E. (2016). “Section 2, Sample Design Report.” In 2014 National Survey on Drug Use and Health: Methodological Resource Book. Rockville, MD: Center for Behavioral Health Statistics and Quality, Substance Abuse and Mental Health Services Administration. |

[17] | Pfeffermann, D., Krieger, A., and Rinott, Y. (1998). “Parametric distributions of complex survey data under informative probability sampling.” Statistica Sinica, 8, 1087-1114 (1998). · Zbl 0923.62019 |

[18] | Rao, J. N. K., Wu, C. F. J., and Yue, K. (1992). “Some Recent Work on Resampling Methods for Complex Surveys.” Survey Methodology, 18: 209-217. |

[19] | Savitsky, T. D. and Toth, D. (2016). “Bayesian estimation under informative sampling.” Electronic Journal of Statistics, 10(1): 1677-1708. · Zbl 1397.62117 |

[20] | Toth, D. and Eltinge, J. L. (2011). “Building consistent regression trees from complex sample data.” Journal of the American Statistical Association, 106(496): 1626-1636. · Zbl 1233.62017 |

[21] | Wang, H., Zhu, R., and Ma, P. (2018). “Optimal subsampling for large sample logistic regression.” Journal of the American Statistical Association, 113(522): 829-844. · Zbl 1398.62196 |

[22] | Wickham, H. (2009). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. URL http://ggplot2.org · Zbl 1170.62004 |

[23] | Williams, M. R. and Savitsky, T. D. (2018). “Bayesian pairwise estimation under dependent informative sampling.” Electronic Journal of Statistics, 12(1): 1631-1661. · Zbl 1396.62015 |

[24] | Williams, M. R. and Savitsky, T. D. (2019). “Supplementary Material for “Bayesian Estimation Under Informative Sampling with Unattenuated Dependence”.” Bayesian Analysis. |

[25] | Yi, G. Y., Rao, J. N. K., and Li, H. (2016). “A Weighted Composite Likelihood Approach for Analysis of Survey Data under Two-level Models.” Statistica Sinica, 26: 569-587. · Zbl 1359.62046 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.