##
**Analyzing establishment nonresponse using an interpretable regression tree model with linked administrative data.**
*(English)*
Zbl 1243.62150

Summary: To gain insight into how characteristics of an establishment are associated with nonresponse, a recursive partitioning algorithm is applied to the Occupational Employment Statistics May 2006 survey data to build a regression tree. The tree models an establishment’s propensity to respond to the survey given certain establishment characteristics. It provides mutually exclusive cells based on the characteristics with homogeneous response propensities. This makes it easy to identify interpretable associations between the characteristic variables and an establishment’s propensity to respond, something not easily done using a logistic regression propensity model. We test the model obtained using the May data against data from the November 2006 Occupational Employment Statistics survey. Testing the model on a disjoint set of establishment data with a very large sample size (n=179,360) offers evidence that the regression tree model accurately describes the association between the establishment characteristics and the response propensity for the OES survey. The accuracy of this modeling approach is compared to that of logistic regression through simulations. This representation is then used along with frame-level administrative wage data linked to sample data to investigate the possibility of nonresponse bias. We show that without proper adjustments the nonresponse does pose a risk of bias and is possibly nonignorable.

### MSC:

62P20 | Applications of statistics to economics |

62D05 | Sampling theory, sample surveys |

62J12 | Generalized linear models (logistic models) |

62A09 | Graphical methods in statistics |

### Keywords:

recursive partitioning; nonignorable nonresponse; propensity model; establishment survey; classification and regression trees (CART)### Software:

ElemStatLearn### References:

[1] | Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees . Wadsworth Advanced Books and Software, Belmont, CA. · Zbl 0541.62042 |

[2] | Eltinge, J. and Yansaneh, I. (1997). Diagnostics for formation of nonresponse adjustment cells, with an application to income nonresponse in the U.S. Consumer Expenditure Survey. Survey Methodology 23 33-40. |

[3] | Göksel, H., Judkins, D. and Mosher, W. (1992). Nonresponse adjustment for a telephone follow-up to a national in-person survey. Journal of Official Statistics 8 417-431. |

[4] | Gordon, L. and Olshen, R. A. (1978). Asymptotically efficient solutions to the classification problem. Ann. Statist. 6 515-533. · Zbl 0437.62056 · doi:10.1214/aos/1176344197 |

[5] | Gordon, L. and Olshen, R. A. (1980). Consistent nonparametric regression from recursive partitioning schemes. J. Multivariate Anal. 10 611-627. · Zbl 0453.62035 · doi:10.1016/0047-259X(80)90074-3 |

[6] | Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning : Data Mining , Inference , and Prediction . Springer, New York. · Zbl 0973.62007 |

[7] | Kim, J. K. and Kim, J. J. (2007). Nonresponse weighting adjustment using estimated response probability. Canad. J. Statist. 35 501-514. · Zbl 1143.62008 · doi:10.1002/cjs.5550350403 |

[8] | Kott, P. S. and Chang, T. (2010). Using calibration weighting to adjust for nonignorable unit nonresponse. J. Amer. Statist. Assoc. 105 1265-1275. · Zbl 1390.62011 · doi:10.1198/jasa.2010.tm09016 |

[9] | LeBlanc, M. and Tibshirani, R. (1998). Monotone shrinkage of trees. J. Comput. Graph. Statist. 7 417-433. |

[10] | Little, R. J. A. (1982). Models for nonresponse in sample surveys. J. Amer. Statist. Assoc. 77 237-250. · Zbl 0494.62009 · doi:10.2307/2287227 |

[11] | Little, R. (1986). Survey nonresponse adjustments for estimates of means. International Statistical Review 2 139-157. · Zbl 0596.62009 · doi:10.2307/1403140 |

[12] | Little, R. and Vartivarian, S. (2005). Does weighting for nonresponse increase the variance of survey means? Survey Methodology 31 161-168. |

[13] | Opsomer, J. D. and Miller, C. P. (2005). Selecting the amount of smoothing in nonparametric regression estimation for complex surveys. J. Nonparametr. Stat. 17 593-611. · Zbl 1065.62071 · doi:10.1080/10485250500054642 |

[14] | Petroni, R., Sigman, R., Willimack, D., Cohen, S. and Tucker, C. (2004). Response rates and nonresponse in establishment surveys-BLS and Census Bureau. In Federal Economic Statistics Advisory Committee Meeting ( December ). Available at . |

[15] | Phipps, P. and Jones, C. (2007). Factors affecting response to the occupational employment statistics survey. In Proceedings of the 2007 Federal Committee on Statistical Methodology Research Conference . Available at . |

[16] | Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41-55. · Zbl 0522.62091 · doi:10.1093/biomet/70.1.41 |

[17] | Schouten, B. and de Nooij, G. (2005). Nonresponse adjustment using classification trees. Discussion Paper 05001, Statistics Netherlands. Available at . |

[18] | Shao, J. (1993). Linear model selection by cross-validation. J. Amer. Statist. Assoc. 88 486-494. · Zbl 0773.62051 · doi:10.2307/2290328 |

[19] | Tomaskovic-Devey, D., Leiter, J. and Thompson, S. (1994). Organizational survey nonresponse. Administrative Science Quarterly 39 439-457. · Zbl 0825.93670 · doi:10.1109/9.362832 |

[20] | Toth, D. and Eltinge, J. (2011). Building consistent regression trees from complex sample data. J. Amer. Statist. Assoc. 106 1626-1636. · Zbl 1233.62017 · doi:10.1198/jasa.2011.tm10383 |

[21] | Toth, D. and Eltinge, J. (2008). Simple function representation of regression trees. Bureau of Labor Statistics Technical Report. |

[22] | Vartivarian, S. and Little, R. (2002). On the formation of weighting adjustment cells for unit nonresponse. In Proceedings of the Survey Research Methods Section 3553-3558. Amer. Statist. Assoc., Alexandria, VA. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.