# zbMATH — the first resource for mathematics

##### Examples
 Geometry Search for the term Geometry in any field. Queries are case-independent. Funct* Wildcard queries are specified by * (e.g. functions, functorial, etc.). Otherwise the search is exact. "Topological group" Phrases (multi-words) should be set in "straight quotation marks". au: Bourbaki & ti: Algebra Search for author and title. The and-operator & is default and can be omitted. Chebyshev | Tschebyscheff The or-operator | allows to search for Chebyshev or Tschebyscheff. "Quasi* map*" py: 1989 The resulting documents have publication year 1989. so: Eur* J* Mat* Soc* cc: 14 Search for publications in a particular source with a Mathematics Subject Classification code (cc) in 14. "Partial diff* eq*" ! elliptic The not-operator ! eliminates all results containing the word elliptic. dt: b & au: Hilbert The document type is set to books; alternatively: j for journal articles, a for book articles. py: 2000-2015 cc: (94A | 11T) Number ranges are accepted. Terms can be grouped within (parentheses). la: chinese Find documents in a given language. ISO 639-1 language codes can also be used.

##### Operators
 a & b logic and a | b logic or !ab logic not abc* right wildcard "ab c" phrase (ab c) parentheses
##### Fields
 any anywhere an internal document identifier au author, editor ai internal author identifier ti title la language so source ab review, abstract py publication year rv reviewer cc MSC code ut uncontrolled term dt document type (j: journal article; b: book; a: book article)
Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. (English) Zbl 0998.62091
Halloran, M. Elizabeth (ed.) et al., Statistical models in epidemiology, the environment, and clinical trials. IMA summer program on Statistics in the health sciences, Univ. of Minnesota, Minneapolis, MN, USA, 1997. New York, NY: Springer. IMA Vol. Math. Appl. 116, 1-94 (2000).

From the introduction: In both observational and randomized studies, subjects commonly drop out of the study (i.e., become censored) before end of follow-up. If, conditional on the history of the observed data up to $t$, the hazard of dropping out of the study (i.e., censoring) at time $t$ does not depend on the possibly unobserved data subsequent to $t$, we say drop-out is ignorable or explainable. On the other hand, if the hazard of drop-out depends on the possibly unobserved future, we say drop-out is non-ignorable or, equivalently, that there is selection bias on unobservables. Neither the existence of selection bias on unobservables nor its magnitude is identifiable from the joint distribution of the observables. In view of this fact, we argue that the data analyst should conduct a “sensitivity analysis” to quantify how one’s inference concerning an outcome of interest varies as a function of the magnitude of non-identifiable selection bias.

In Sections 2 and 3, we present a new class of nonparametric (just) identified (NPI) models that are useful for this purpose. These models are nonparametric (i.e., saturated) in the sense that each model in the class places no restrictions on the joint distribution of the observed data. Hence, each model in the class fits the observed data perfectly and cannot be rejected by any statistical test. Each model is (just) identified in the sense that the model identifies the distribution of the underlying full data (i.e., the distribution of the data that would have been observed in the absence of drop-out). Each NPI model in the class is indexed by a selection bias function that quantifies the magnitude of selection bias due to unobservables. Since each model is nonparametric, this selection bias function is not identified from the distribution of the observed data. However, we show that one can perform a sensitivity analysis that examines how inferences concerning functionals of the full data change as the nonidentified selection bias function is varied over a plausible range. A nice feature of our approach is that, as discussed in Section 4, for each choice of the non-identified selection bias function, the full data functionals of interest can be estimated at ${n}^{1/2}$-rates using the modern theory of estimation in semiparametric models with missing data.

In. Sec. 5, we study in further detail a particular NPI model – the selection odds NPI model. This model is the unique NPI model that has both a “pattern mixture” and a selection model interpretation. Under this model, we derive an explicit formula, the selection bias $g$-computation algorithm formula, for the distribution of the full (i.e., complete) data.

There is a close connection between selection bias due to unobserved factors in follow-up studies with drop-out and selection bias due to unmeasured confounding factors in causal inference models. In Secs. 6 and 7, we use this connection to generalize our NPI selection bias models to NPI causal inference models. Unfortunately, we show in Sec. 7.2 that there is a major difficulty with trying to construct semiparametric estimators of the parameters of an NPI selection odds causal inference model. One solution to this difficulty is to give up the attempt to construct simple semiparametric estimators and, instead, use fully parametric likelihood-based inference. This approach is briefly discussed in the last remark of Sec. 8.6.1. A second and better approach is to develop alternative NPI causal inference models that simultaneously allow for unmeasured confounding and admit simple semiparametric estimators. This latter approach is considered in Sec. 8.

In Secs. 9 and 10, we return to the subject of missing data models. The NPI missing data models discussed in Sections 2, 3 assume a monotone missing data pattern. In Section 10, we construct NPI models, the selection bias permutation missingness models, for non-monotone missing data with positive probability of complete observations.

In Section 11, we consider a Bayesian, as opposed to a sensitivity analysis, approach to summarizing our uncertainty.

##### MSC:
 62N99 Survival analysis and censored data 62G99 Nonparametric inference 62G05 Nonparametric estimation 62P10 Applications of statistics to biology and medical sciences