# zbMATH — the first resource for mathematics

Robust model-based sampling designs. (English) Zbl 1322.62064
Summary: We investigate methods for the design of sample surveys, and address the traditional resistance of survey samplers to the use of model-based methods by incorporating model robustness at the design stage. The designs are intended to be sufficiently flexible and robust that resulting estimates, based on the designer’s best guess at an appropriate model, remain reasonably accurate in a neighbourhood of this central model. Thus, consider a finite population of $$N$$ units in which a survey variable $$Y$$ is related to a $$q$$ dimensional auxiliary variable $$\mathbf x$$. We assume that the values of $$\mathbf x$$ are known for all $$N$$ population units, and that we will select a sample of $$n\leq N$$ population units and then observe the $$n$$ corresponding values of $$Y$$. The objective is to predict the population total $$T=\sum^N_{i=1}Y_i$$. The design problem which we consider is to specify a selection rule, using only the values of the auxiliary variable, to select the $$n$$ units for the sample so that the predictor has optimal robustness properties. We suppose that $$T$$ will be predicted by methods based on a linear relationship between $$Y$$ – possibly transformed – and given functions of $$\mathbf x$$. We maximise the mean squared error of the prediction of $$T$$ over realistic neighbourhoods of the fitted linear relationship, and of the assumed variance and correlation structures. This maximised mean squared error is then minimised over the class of possible samples, yielding an optimally robust (‘minimax’) design. To carry out the minimisation step we introduce a genetic algorithm and discuss its tuning for maximal efficiency.

##### MSC:
 62C20 Minimax procedures in statistical decision theory 62D05 Sampling theory, sample surveys 62K05 Optimal statistical designs
##### Keywords:
genetic algorithm; minimax; prediction; simulated annealing; smearing
Full Text:
##### References:
 [1] Chambers, R.L., Dunstan, R.: Estimating distribution functions from survey data. Biometrika 73, 597–604 (1986) · Zbl 0614.62005 [2] Duan, N.: Smearing estimate: a nonparametric retransformation method. J. Am. Stat. Assoc. 78, 605–610 (1983) · Zbl 0534.62021 [3] Fang, Z., Wiens, D.P.: Integer-valued, minimax robust designs for estimation and extrapolation in heteroscedastic, approximately linear models. J. Am. Stat. Assoc. 95, 807–818 (2000) · Zbl 0995.62068 [4] Ma, Y., Welsh, A.H.: Transformation and smoothing in sample survey data. Scand. J. Stat. 37, 496–513 (2010) · Zbl 1226.62005 [5] Mandal, A., Johnson, K., Wu, J.C.F., Bornemeier, D.: Identifying promising compounds in drug discovery: genetic algorithms and some new statistical techniques. J. Chem. Inf. Model. 47, 981–988 (2007) [6] Thisted, R.: Elements of Statistical Computing. Chapman and Hall, New York (1988) · Zbl 0663.62001 [7] Valliant, R., Dorfman, A.H., Royall, R.M.: Finite Population Sampling and Inference–a Prediction Approach. Wiley, New York (2000) · Zbl 0964.62007 [8] Welsh, A.H., Zhou, X.H.: Estimating the retransformed mean in a heteroscedastic two-part model. J. Stat. Plan. Inference 136, 860–881 (2006) · Zbl 1079.62039 [9] Wiens, D.P., Xu, X.: Robust prediction and extrapolation designs for misspecified generalized linear regression models. J. Stat. Plan. Inference 138, 30–46 (2008) · Zbl 1144.62334 [10] Wiens, D.P., Zhou, J.: Robust estimators and designs for field experiments. J. Stat. Plan. Inference 138, 93–104 (2008) · Zbl 1144.62364
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.