Assessing nonresponse bias in a business survey: proxy pattern-mixture analysis for skewed data. (English) Zbl 1397.62044

Summary: The Service Annual Survey (SAS) is a business survey conducted annually by the U.S. Census Bureau that collects aggregate and detailed revenues and expenses data. Typical of many business surveys, the SAS population is highly positively skewed, with large companies comprising a large proportion of the published totals. When alternative data are not available, missing data are handled with ratio imputation models that assume missingness is at random. We propose a proxy pattern-mixture (PPM) model that provides a simple framework for assessing nonresponse bias with respect to different nonresponse mechanisms. PPM models were first introduced in this context by R. R. Andridge and R. J. A. Little [“Proxy pattern-mixture analysis for survey nonresponse”, J. Official Stat. 27, No. 2, 153–180 (2011)], but their model assumed the characteristic of interest and the predicted proxy have a bivariate normal distribution, conditional on the missingness indicator. Although often appropriate for large demographic surveys, the normality assumption is less justifiable for the highly skewed SAS data. We propose an alternative PPM model using a bivariate gamma distribution more appropriate for the SAS data. We compare the two PPM models through application to data from six years of data collection in three industries in the health care and transportation sectors of the SAS. Finally, we illustrate properties of the method through simulation.


62D05 Sampling theory, sample surveys
62P20 Applications of statistics to economics


Full Text: DOI Euclid


[1] Andridge, R. R. and Little, R. J. A. (2011). Proxy pattern-mixture analysis for survey nonresponse. Journal of Official Statistics 27 153-180.
[2] Andridge, R. R. and Thompson, K. J. (2015a). Using the fraction of missing information to identify auxiliary variables for imputation procedures via proxy pattern-mixture models. Int. Stat. Rev. 83 472-492.
[3] Andridge, R. R. and Thompson, K. J. (2015b). Supplement to “Assessing nonresponse bias in a business survey: Proxy pattern-mixture analysis for skewed data.” .
[4] Bavdaž, M. (2010). The multidimensional integral business survey response model. Survey Methodology 1 81-93.
[5] Beaumont, J.-F., Haziza, D. and Bocci, C. (2011). On variance estimation under auxiliary value imputation in sample surveys. Statist. Sinica 21 515-537. · Zbl 1214.62008
[6] Devroye, L. (2002). Simulating Bessel random variables. Statist. Probab. Lett. 57 249-257. · Zbl 1005.65008
[7] Efron, B. (1994). Missing data, imputation, and the bootstrap. J. Amer. Statist. Assoc. 89 463-479. · Zbl 0806.62033
[8] Fay, R. E. III and Herriot, R. A. (1979). Estimates of income for small places: An application of James-Stein procedures to census data. J. Amer. Statist. Assoc. 74 269-277.
[9] Feller, W. (1966). An Introduction to Probability Theory and Its Applications. Vol. II . Wiley, New York. · Zbl 0138.10207
[10] Harel, O. (2007). Inferences on missing information under multiple imputation and two-stage multiple imputation. Stat. Methodol. 4 75-89. · Zbl 1248.62018
[11] Haziza, D., Thompson, K. J. and Yung, W. (2010). The effect of nonresponse adjustments on variance estimation. Survey Methodology 36 35-43.
[12] Iliopoulos, G., Karlis, D. and Ntzoufras, I. (2005). Bayesian estimation in Kibble’s bivariate gamma distribution. Canad. J. Statist. 33 571-589. · Zbl 1098.62030
[13] Izawa, T. (1965). Two or multi-dimensional gamma-type distribution and its application to rainfall data. Papers in Meteorology and Geophysics 15 167-200.
[14] Kibble, W. F. (1941). A two-variate gamma type distribution. Sankhyā 5 137-150. · Zbl 0063.03231
[15] Kreuter, F., Olson, K., Wagner, J., Yan, T., Ezzati-Rice, T. M., Casas-Cordero, C., Lemay, M., Peytchev, A., Groves, R. M. and Raghunathan, T. E. (2010). Using proxy measures and other correlates of survey outcomes to adjust for non-response: Examples from multiple surveys. J. Roy. Statist. Soc. Ser. A 173 389-407. · Zbl 05680874
[16] Krewski, D. and Rao, J. N. K. (1981). Inference from stratified samples: Properties of the linearization, jackknife and balanced repeated replication methods. Ann. Statist. 9 1010-1019. · Zbl 0474.62013
[17] Little, R. J. A. (1994). A class of pattern-mixture models for normal incomplete data. Biometrika 81 471-483. · Zbl 0816.62023
[18] Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data , 2nd ed. Wiley, Hoboken, NJ. · Zbl 1011.62004
[19] Lohr, S. L. (2010). Sampling : Design and Analysis , 2nd ed. Brooks/Cole, Boston, MA. · Zbl 1273.62010
[20] Makarov, R. N. and Glew, D. (2010). Exact simulation of Bessel diffusions. Monte Carlo Methods Appl. 16 283-306. · Zbl 1206.65026
[21] Ong, S. H. (1992). The computer generation of bivariate binomial variables with given marginals and correlations. Comm. Statist. Simulation Comput. 21 285-299. · Zbl 0825.65004
[22] Peytcheva, E. and Groves, R. M. (2009). Using variation in response rates of demographic subgroups as evidence of nonresponse bias in survey estimates. Journal of Official Statistics 25 193-201.
[23] R Core Team (2012). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at .
[24] Rao, J. N. K. (2003). Small Area Estimation . Wiley, Hoboken, NJ. · Zbl 1026.62003
[25] Rao, J. N. K. and Scott, A. J. (1992). A simple method of the analysis of clustered binary data. Biometrika 74 577-585.
[26] Roberts, G., Rao, J. N. K. and Kumar, S. (1987). Logistic regression analysis of sample survey data. Biometrika 74 1-12. · Zbl 0625.62007
[27] Royall, R. M. (1992). The model based (prediction) approach to finite population sampling theory. In Current Issues in Statistical Inference : Essays in Honor of D. Basu. Institute of Mathematical Statistics Lecture Notes-Monograph Series 17 225-240. IMS, Hayward, CA.
[28] Snijkers, G., Haraldsen, G., Jones, J. and Willimack, D. K. (2013). Designing and Conducting Business Surveys . Wiley, New York.
[29] Thompson, K. J. (2005). An empirical investigation into the effects of replicate reweighting on variance estimates for the annual capital expenditures survey. In Proceedings of the Federal Committee on Statistical Methods Research Conference . U.S. Office of Management and Budget, Washington, DC.
[30] Thompson, K. J. and Oliver, B. E. (2012). Response rates in business surveys: Going beyond the usual performance measure. Journal of Official Statistics 28 221-237.
[31] Thompson, K. J. and Washington, K. T. (2013). Challenges in the treatment of unit nonresponse for selected business surveys: A case study. Survey Methods : Insights from the Field . Retrieved from .
[32] Wagner, J. (2010). The fraction of missing information as a tool for monitoring the quality of survey data. Public Opinion Quarterly 74 223-243.
[33] Wagner, J. (2012). A comparison of alternative indicators for the risk of nonresponse bias. Public Opinion Quarterly 76 555-575.
[34] Willimack, D. K. and Nichols, E. (2010). A hybrid response process model for business surveys. Journal of Official Statistics 1 3-24.
[35] Yuan, L. and Kalbfleisch, J. D. (2000). On the Bessel distribution and related problems. Ann. Inst. Statist. Math. 52 438-447. · Zbl 0960.62017
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.