## Semiparametric methods for response-selective and missing data problems in regression.(English)Zbl 0915.62030

Summary: Suppose that data are generated according to the model $$f({\mathbf y} | {\mathbf x};\theta) g({\mathbf x})$$, where $${\mathbf y}$$ is a response and $${\mathbf x}$$ are covariates. We derive and compare semiparametric likelihood and pseudolikelihood methods for estimating $$\theta$$ for situations in which units generated are not fully observed and in which it is impossible or undesirable to model the covariate distribution. The probability that a unit is fully observed may depend on $${\mathbf y}$$, and there may be a subset of covariates which is observed only for a subsample of individuals. Our key assumptions are that the probability that a unit has missing data depends only on which of a finite number of strata that $$({\mathbf y},{\mathbf x})$$ belongs to and that the stratum membership is observed for every unit.
Applications include case-control studies in epidemiology, field reliability studies and broad classes of missing data and measurement error problems. Our results make fully efficient estimation of $$\theta$$ feasible, and they generalize and provide insight into a variety of methods that have been proposed for specific problems.

### MSC:

 62G07 Density estimation 62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: