##
**Endogenous post-stratification in surveys: classifying with a sample-fitted model.**
*(English)*
Zbl 1132.62006

Summary: Post-stratification is frequently used to improve the precision of survey estimators when categorical auxiliary information is available from sources outside the survey. In natural resource surveys, such information is often obtained from remote sensing data, classified into categories and displayed as pixel-based maps. These maps may be constructed based on classification models fitted to the sample data. Post-stratification of the sample data based on categories derived from the sample data (“endogenous post-stratification”) violates the standard post-stratification assumptions that observations are classified without error into post-strata, and post-stratum population counts are known. Properties of the endogenous post-stratification estimator are derived for the case of a sample-fitted generalized linear model, from which the post-strata are constructed by dividing the range of the model predictions into predetermined intervals. Design consistency of the endogenous post-stratification estimator is established under mild conditions. Under a superpopulation model, consistency and asymptotic normality of the endogenous post-stratification estimator are established, showing that it has the same asymptotic variance as the traditional post-stratified estimator with fixed strata. Simulation experiments demonstrate that the practical effect of first fitting a model to the survey data before post-stratifying is small, even for relatively small sample sizes.

### MSC:

62D05 | Sampling theory, sample surveys |

62F12 | Asymptotic properties of parametric estimators |

62J12 | Generalized linear models (logistic models) |

### Keywords:

calibration; classification; design consistency; generalized linear model; Horvitz-Thompson estimator; ratio estimator; stratification; survey regression estimator
PDFBibTeX
XMLCite

\textit{F. J. Breidt} and \textit{J. D. Opsomer}, Ann. Stat. 36, No. 1, 403--427 (2008; Zbl 1132.62006)

### References:

[1] | Breidt, F. J., Claeskens, G. and Opsomer, J. D. (2005). Model-assisted estimation for complex surveys using penalized splines. Biometrika 92 831-846. · Zbl 1151.62306 · doi:10.1093/biomet/92.4.831 |

[2] | Breidt, F. J. and Opsomer, J. D. (2000). Local polynomial regression estimators in survey sampling. Ann. Statist. 28 1026-1053. · Zbl 1105.62302 · doi:10.1214/aos/1015956706 |

[3] | Cassel, C.-M., Särndal, C.-E. and Wretman, J. H. (1977). Foundations of Inference in Survey Sampling . Wiley, New York. · Zbl 0391.62007 |

[4] | Cochran, W. G. (1977). Sampling Techniques , 3rd ed. Wiley, New York. · Zbl 0353.62011 |

[5] | Frayer, W. E. and Furnival, G. M. (1999). Forest survey sampling designs: A history. J. Forestry 97 4-8. |

[6] | McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models , 2nd ed. Chapman and Hall, London. · Zbl 0588.62104 |

[7] | Moisen, G. G. and Frescino, T. S. (2002). Comparing five modelling techniques for predicting forest characteristics. Ecological Modelling 157 209-225. |

[8] | Randles, R. H. (1982). On the asymptotic normality of statistics with estimated parameters. Ann. Statist. 10 462-474. · Zbl 0493.62022 · doi:10.1214/aos/1176345787 |

[9] | Robinson, P. M. and Särndal, C.-E. (1983). Asymptotic properties of the generalized regression estimation in probability sampling. Sankhyā Ser. B 45 240-248. · Zbl 0531.62005 |

[10] | Ruefenacht, B., Moisen, G. G. and Blackard, J. A. (2004). Forest type mapping of the Intermountain West. In Remote Sensing for Field Users: Proceedings of The Tenth Forest Service Remote Sensing Applications Conference, 2004 April 5-9, Salt Lake City, UT (J. D. Greer, ed.). American Society of Photogrammetry and Remote Sensing, Bethesda, MD. |

[11] | Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling . Springer, New York. · Zbl 0742.62008 |

[12] | Scott, C. T., Bechtold, W. A., Reams, G. A., Smith, W. D., Westfall, J. A., Hansen, M. H. and Moisen, G. G. (2005). Sample-based estimators used by the Forest Inventory and Analysis national information management system. Gen. Technical Report SRS-80 53-77. Asheville, U.S. Department of Agriculture, Forest Service, Southern Research Station, NC. |

[13] | Tucker, H. G. (1959). A generalization of the Glivenko-Cantelli theorem. Ann. Math. Statist. 30 828-830. · Zbl 0093.14501 · doi:10.1214/aoms/1177706212 |

[14] | Wu, C. and Sitter, R. R. (2001). A model-calibration approach to using complete auxiliary information from survey data. J. Amer. Statist. Assoc. 96 185-193. JSTOR: · Zbl 1015.62005 · doi:10.1198/016214501750333054 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.