×

Inference for superpopulation parameters using sample surveys. (English) Zbl 1013.62005

Summary: Sample survey inference is historically concerned with finite - population parameters, that is, functions (like means and totals) of the observations for the individuals in the population. In scientific applications, however, interest usually focuses on the “superpopulation” parameters associated with a stochastic mechanism hypothesized to generate the observations in the population rather than the finite-population parameters.
Two relevant findings discussed in this paper are that (1) with stratified sampling, it is not sufficient to drop finite-population correction factors from standard design-based variance formulas to obtain appropriate variance formulas for superpopulation inference, and (2) with cluster sampling, standard design-based variance formulas can dramatically underestimate superpopulation variability, even with a small sampling fraction of the final units.
A literature review of inference for superpopulation parameters is given, with emphasis on why these findings have not been previously appreciated. Examples are provided for estimating superpopulation means, linear regression coefficients and logistic regression coefficients using U.S. data from the 1987 National Health Interview Survey, the third National Health and Nutrition Examination Survey and the 1986 National Hospital Discharge Survey.

MSC:

62D05 Sampling theory, sample surveys

Software:

SUDAAN
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] ARNAB, R. (1992). Estimation of a finite population mean under superpopulation models. Comm. Statist. Theory Methods 21 1717-1724. · Zbl 0775.62010 · doi:10.1080/03610929208830874
[2] BELLHOUSE, D. R., THOMPSON, M. E. and GODAMBE, V. P.
[3] . Two-stage sampling with exchangeable prior distributions. Biometrika 64 97-103. JSTOR: · Zbl 0347.62010 · doi:10.1093/biomet/64.1.97
[4] BOUZA, C. N. (1995). Linear rank tests derived from a superpopulation model. Biometrical J. 37 497-506. · Zbl 0850.62372 · doi:10.1002/bimj.4710370409
[5] BRECKLING, J. U., CHAMBERS, R. L., DORFMAN, A. H.,
[6] TAM, S. M. and WELSH, A. H. (1994). Maximum likelihood inference from sample survey data. Internat. Statist. Rev. 62 349-363. · Zbl 0828.62009 · doi:10.2307/1403766
[7] BREEN, N. and KESSLER, L. (1994). Changes in the use of screening mammography: evidence from the 1987 and 1990 National Health Interview Survey s. Amer. J. Pub. Health 84 62-7.
[8] CAMPBELL, C. (1977). Properties of ordinary and weighted least square estimators of regression coefficients for two-stage samples. In Proceedings of the Section on Social Statistics 800-805. Amer. Statist. Assoc., Alexandria, VA.
[9] CASSEL, C., SÄRNDAL, C. and WRETMAN, H. H. (1977). Foundations of Inference in Survey Sampling. Wiley, New York. · Zbl 0391.62007
[10] CHAMBERS, R. L. (1986). Design-adjusted parameter estimation. J. Roy. Statist. Soc. Ser. A 149 161-173. JSTOR: · Zbl 0591.62009 · doi:10.2307/2981528
[11] CHAMBERS, R. L., DORFMAN, A. H. and WANG, S. (1998). Limited information likelihood analysis of survey data. J. Roy. Statist. Soc. Ser. B 60 397-411. JSTOR: · Zbl 0918.62006 · doi:10.1111/1467-9868.00132
[12] CHRISTENSEN, R. (1984). A note on ordinary least squares methods for two-stage sampling. J. Amer. Statist. Assoc. 79 720-721.
[13] CHRISTENSEN, R. (1987). The analysis of two-stage sampling data by ordinary least squares. J. Amer. Statist. Assoc. 82 492-498. JSTOR: · Zbl 0633.62009 · doi:10.2307/2289452
[14] COCHRAN, W. G. (1939). The use of the analysis of variance in enumeration by sampling. J. Amer. Statist. Assoc. 34 492-510. · Zbl 0023.14901 · doi:10.2307/2279483
[15] COCHRAN, W. G. (1946). Relative accuracy of sy stematic and stratified random samples for a certain class of populations. Ann. Math. Statist. 17 164-177. · Zbl 0063.00940 · doi:10.1214/aoms/1177730978
[16] COCHRAN, W. G. (1977). Sampling Techniques, 3rd ed. Wiley, New York. · Zbl 0353.62011
[17] COSSLETT, S. R. (1981). Maximum likelihood estimator for choice-based samples. Econometrica 49 1289-1316. JSTOR: · Zbl 0494.62097 · doi:10.2307/1912755
[18] DEMETS, D. and HALPERIN, M. (1977). Estimation of a simple regression coefficient in samples arising from a sub-sampling procedure. Biometrics 33 47-56. · Zbl 0347.62050 · doi:10.2307/2529302
[19] DEMING, W. E. (1953). On the distinction between enumerative and analytic survey s. J. Amer. Statist. Assoc. 48 244-255.
[20] DEMING, W. E. and STEPHAN, F. F. (1941). On the interpretation of censuses as samples. J. Amer. Statist. Assoc. 36 45-49.
[21] DUMOUCHEL, W. H. and DUNCAN, G. J. (1983). Using sample survey weights in multiple regression analyses of stratified samples. J. Amer. Statist. Assoc. 78 535-543. · Zbl 0533.62011 · doi:10.2307/2288115
[22] DURBIN, J. (1953). Some results in sampling theory when the units are selected with unequal probabilities. J. Roy. Statist. Soc. Ser. B 15 262-269. · Zbl 0052.15302
[23] ELTINGE, J. L. and JANG, D. S. (1996). Stability measures of variance component estimators under a stratified multistage design. Survey Methodology 22 157-165.
[24] ERICSON, W. A. (1969). Subjective Bayesian models in sampling finite populations (with discussion). J. Roy. Statist. Soc. Ser. B 31 195-233. JSTOR: · Zbl 0186.51901
[25] EZZATI, T. M., MASSEY, J. T., WAKSBERG, J., CHU, A. and MAURER, K. R. (1992). Sample design: third National Health and Nutrition Examination Survey. Vital Health Statist. 2.
[26] FULLER, W. A. (1975). Regression analysis for sample survey. Sankhy\?a Ser. C 37 117-132. · Zbl 0395.62009
[27] GODAMBE, V. P. and THOMPSON, M. E. (1986). Parameters of superpopulation and survey population: their relationships and estimation. Internat. Statist. Rev. 54 127-138. JSTOR: · Zbl 0612.62011 · doi:10.2307/1403139
[28] GOLDSTEIN, H. (1986). Multilevel mixed linear model analysis using iterative generalized least squares. Biometrika 73 43-56. JSTOR: · Zbl 0587.62143 · doi:10.1093/biomet/73.1.43
[29] GRAUBARD, B. I. and KORN, E. L. (1996a). Modelling the sampling design in the analysis of health survey s. Statist. Methods Medical Res. 5 263-281.
[30] GRAUBARD, B. I. and KORN, E. L. (1996b). Survey inference for subpopulations. Amer. J. Epidemiol. 144 102-106.
[31] GRAVES, E. J. (1988). Utilization of short-stay hospitals, United States, 1986, annual summary. Vital Health Statist. 13.
[32] HANSEN, M. H., HURWITZ, W. N. and MADOW, W. G. (1953). Sample Survey Methods and Theory 1. Wiley, New York. · Zbl 0052.14801
[33] HANSEN, M. H., MADOW, W. G. and TEPPING, B. J. (1983). An evaluation of model-dependent and probability-sampling inferences in sample survey s (with discussion). J. Amer. Statist. Assoc. 78 776-807.
[34] HARTLEY, H. O. and SIELKEN, R. L., Jr. (1975). A ”superpopulation viewpoint” for finite population sampling. Biometrics 31 411-422. JSTOR: · Zbl 0334.62005 · doi:10.2307/2529429
[35] HASLETT, S. (1985). The linear non-homogeneous estimator in sample survey s. Sankhy\?a Ser. B 47 101-117. · Zbl 0589.62008
[36] HAUSMAN, J. A. and WISE, D. A. (1981). Stratification of endogenous variables and estimation: the Gary income maintenance experiment. In Structural Analy sis of Discrete Data with Econometric Applications (C. F. Manski and S. McFadden, eds.) 365-391. MIT Press, Cambridge, MA.
[37] HOLT, D. and SCOTT, A. J. (1981). Regression analysis using survey data. The Statistician 30 169-178.
[38] HOLT, D. and SMITH, T. M. F. (1979). Post stratification. J. Roy. Statist. Soc. Ser. A 142 33-46.
[39] HOLT, D., SMITH, T. M. F. and WINTER, P. D. (1980). Regression analysis of data from complex survey s. J. Roy. Statist. Soc. Ser. A 143 474-487. JSTOR: · Zbl 0452.62052 · doi:10.2307/2982065
[40] ISAKI, C. T. and FULLER, W. A. (1982). Survey design under the regression superpopulation model. J. Amer. Statist. Assoc. 77 89-96. JSTOR: · Zbl 0511.62016 · doi:10.2307/2287773
[41] JEWELL, N. P. (1985). Least squares regression with data arising from stratified samples of the dependent variable. Biometrika 72 11-21. JSTOR: · doi:10.1093/biomet/72.1.11
[42] KLEIN, L. R. and MORGAN, J. N. (1951). Results of alternative statistical treatments of sample survey data. J. Amer. Statist. Assoc. 46 442-460.
[43] KONIJN, H. S. (1962). Regression analysis in sample survey s. J. Amer. Statist. Assoc. 57 590-606. JSTOR: · Zbl 0138.13404 · doi:10.2307/2282397
[44] KOOP, J. C. (1986). Some problems of statistical inference from sample survey data for analytic studies. Statistics 17 237-247. [Correction (1992) Statistics 23 187.] · Zbl 0588.62017 · doi:10.1080/02331888508801933
[45] KORN, E. L. and GRAUBARD, B. I. (1990). Simultaneous testing of regression coefficients with complex survey data: Use of Bonferroni t statistics. Amer. Statist. 44 270-276.
[46] KORN, E. L. and GRAUBARD, B. I. (1995). Analy sis of large health survey s: accounting for the sampling design. J. Roy. Statist. Soc. Ser. A 158 263-295.
[47] KORN, E. L. and GRAUBARD, B. I. (1998). Variance estimation for superpopulation parameters. Statist. Sinica 8 1131-1151. · Zbl 0916.62009
[48] KORN, E. L. and GRAUBARD, B. I. (1999). Analy sis of Health Survey s. Wiley, New York.
[49] KOTT, P. S. (1991). A model-based look at linear regression with survey data. Amer. Statist. 45 107-112. JSTOR: · doi:10.2307/2684369
[50] KOTT, P. S. (1993). Comment on Potthoff, Woodbury, and Manton. Letter to the Editor. J. Amer. Statist. Assoc. 88 716. JSTOR: · doi:10.2307/2291300
[51] KRIEGER, A. M. and PFEFFERMANN, D. (1992). Maximum likelihood estimation from complex sample survey s. Survey Methodology 18 225-239.
[52] LEHMANN, E. L. (1975). Nonparametrics. Holden-Day, San Francisco. · Zbl 0354.62038
[53] LONGFORD, N. T. (1996). Model-based variance estimation in survey s with stratified clustered design. Austral. J. Statist. 38 333-352. · Zbl 0878.62010 · doi:10.1111/j.1467-842X.1996.tb00687.x
[54] MAGEE, L. (1998). Improving survey-weighted least squares regression. J. Roy. Statist. Soc. Ser. B 60 115-126. JSTOR: · Zbl 0909.62004 · doi:10.1111/1467-9868.00112
[55] MASSEY, J. T., MOORE, T. F., PARSONS, V. L. and TADROS,
[56] W. (1989). Design and estimation for the National Health Interview Survey, 1985-1994. Vital Health Statist. 2.
[57] NATHAN, G. and HOLT, D. (1980). The effect of survey design on regression analysis. J. Roy. Statist. Soc. Ser. B 42 377-386. JSTOR: · Zbl 0455.62009
[58] NATIONAL CENTER FOR HEALTH STATISTICS. (1994). Plan and operation of the Third National Health and Nutrition Examination Survey, 1988-1994. Vital Health Statist. 1.
[59] NORDBERG, L. (1989). Generalized linear modeling of sample survey data. J. Official Statist. 5 223-239.
[60] PATIL, G. P. and RAO, C. R. (1978). Weighted distributions and size-biased sampling with applications to wildlife populations and human families. Biometrics 34 179-189. JSTOR: · Zbl 0384.62014 · doi:10.2307/2530008
[61] PFEFFERMANN, D. and HOLMES, D. J. (1985). Robustness considerations in the choice of a method of inference for regression analysis of survey data. J. Roy. Statist. Soc. Ser. A 148 268-278. [Correction (1985) J. Roy. Statist. Soc. Ser. A 148 357.] · Zbl 0576.62016 · doi:10.2307/2981971
[62] PFEFFERMANN, D. and LAVANGE, L. (1989). Regression models for stratified multi-stage cluster samples. In Analy sis of Complex Survey s (C. J. Skinner, D. Holt and T. M. F. Smith, eds.) 237-260. Wiley, New York.
[63] PFEFFERMANN, D. and NATHAN, G. (1981) Regression analysis of data from a cluster sample. J. Amer. Statist. Assoc. 76 681-689. · Zbl 0492.62055 · doi:10.2307/2287530
[64] PFEFFERMANN, D., SKINNER, C. J., HOLMES, D. J., GOLD STEIN, H. and RASBASH, J. (1998). Weighting for unequal selection probabilities in multilevel models (with discussion). J. Roy. Statist. Soc. Ser. B 60 23-56. JSTOR: · Zbl 0909.62006 · doi:10.1111/1467-9868.00106
[65] PORTER, R. D. (1973). On the use of survey sample weights in the linear model. Ann. Econom. Social Measurement 2 141-158.
[66] POTTHOFF, R. F., WOODBURY, M. A. and MANTON, K. G.
[67] . ”Equivalent sample size” and ”equivalent degrees of freedom” refinements for inference using survey weights under superpopulation models. J. Amer. Statist. Assoc. 87 383-396. JSTOR: · Zbl 0783.62012 · doi:10.2307/2290269
[68] QUESENBERRY, C. P., Jr. and JEWELL, N. P. (1986). Regression analysis based on stratified samples. Biometrika 73 605-614. JSTOR: · Zbl 0616.62090 · doi:10.1093/biomet/73.3.605
[69] RAO, C. R. (1965). On discrete distributions arising out of methods of ascertainment. In Classical and Contagious Discrete Distributions (G. P. Patil, ed.) 320-332. Statistical Publishing Society, Calcutta. · Zbl 0212.21903
[70] RAO, J. N. K. (1985). Conditional inference in survey sampling. Survey Methodology 11 15-31.
[71] ROy ALL, R. M. and CUMBERLAND, W. G. (1981). An empirical study of the ratio estimator and its variance (with discussion). J. Amer. Statist. Assoc. 76 66-88. JSTOR: · Zbl 0465.62017 · doi:10.2307/2287043
[72] SÄRNDAL, C. E. (1980). Two model-based inference arguments in survey sampling. Austral. J. Statist. 22 341-348. · Zbl 0454.62013 · doi:10.1111/j.1467-842X.1980.tb01184.x
[73] SÄRNDAL, C. E., SWENSSON, B. and WRETMAN, J. (1992). Model Assisted Survey Sampling. Springer, New York. · Zbl 0742.62008
[74] SCHOENBORN, C. A. and MARANO, M. (1988). Current estimates from the National Health Interview Survey, United States, 1987. Vital Health Statist. 10.
[75] SCOTT, A. J. and HOLT, D. (1982). The effect of two-stage sampling on ordinary least squares methods. J. Amer. Statist. Assoc. 77 848-854. · Zbl 0506.62051 · doi:10.2307/2287317
[76] SCOTT, A. J. and SMITH, T. M. F. (1969). Estimation in multi-stage survey s. J. Amer. Statist. Assoc. 64 830-840.
[77] SCOTT, A. J. and WILD, C. J. (1986). Fitting logistic models under case-control or choice based sampling. J. Roy. Statist. Soc. Ser. B 48 170-182. JSTOR: · Zbl 0608.62084
[78] SCOTT, A. J. and WILD, C. J. (1989). Selection based on the response variable in logistic regression. In Analy sis of Complex Survey s (C. J. Skinner, D. Holt and T. M. F. Smith, eds.) 191-205. Wiley, New York.
[79] SCOTT, A. J. and WILD, C. J. (1991). Fitting logistic regression models in stratified case-control studies. Biometrics 47 497-510. JSTOR: · Zbl 0736.62093 · doi:10.2307/2532141
[80] SEDRANSK, J. (1965). Analy tical survey s with cluster sampling. J. Roy. Statist. Soc. Ser. B 27 264-278. JSTOR: · Zbl 0137.36702
[81] SHAH, B. V., BARNWELL, B. G. and BIELER, G. S. (1997). SUDAAN User’s Manual, Release 7.5. Research Triangle Institute, Research Triangle Park, NC.
[82] SIMMONS, W. R. and SCHNACK, G. A. (1970). Development of the design of the NCHS Hospital Discharge Survey. Vital Health Statist. 2.
[83] SKINNER, C. J. (1994). Sample models and weights. In Proceedings of the Section on Survey Research Methods 133-142. Amer. Statist. Assoc., Alexandria, VA.
[84] SKINNER, C. J., HOLT, D. and SMITH, T. M. F., eds. (1989). Analy sis of Complex Survey s. Wiley, New York. · Zbl 0701.62018
[85] TEN CATE, A. (1986). Regression analysis using survey data with endogenous design. Survey Methodology 12 121-138.
[86] THOMSEN, I. (1978). Design and estimation problems when estimating a regression coefficient from survey data. Metrika 25 27-35. · Zbl 0382.62009 · doi:10.1007/BF02204348
[87] YATES, F. (1981). Sampling Methods for Censuses and Survey s, 4th ed. Oxford Univ. Press.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.