zbMATH — the first resource for mathematics

Graphical models for inference under outcome-dependent sampling. (English) Zbl 1329.62042
Summary: We consider situations where data have been collected such that the sampling depends on the outcome of interest and possibly further covariates, as for instance in case-control studies. Graphical models represent assumptions about the conditional independencies among the variables. By including a node for the sampling indicator, assumptions about sampling processes can be made explicit. We demonstrate how to read off such graphs whether consistent estimation of the association between exposure and outcome is possible. Moreover, we give sufficient graphical conditions for testing and estimating the causal effect of exposure on outcome. The practical use is illustrated with a number of examples.

62-09 Graphical methods in statistics (MSC2010)
62G05 Nonparametric estimation
62G10 Nonparametric hypothesis testing
62D05 Sampling theory, sample surveys
62H05 Characterization and structure theory for multivariate probability distributions; copulas
62H20 Measures of association (correlation, canonical correlation, etc.)
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI Euclid
[1] Altham, P. M. E. (1970). The measurement of association of rows and columns for an r\times s contingency table. J. Roy. Statist. Soc. Ser. B 32 63-73. · Zbl 0204.52903
[2] Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91 444-455. · Zbl 0897.62130
[3] Asmussen, S. and Edwards, D. (1983). Collapsibility and response variables in contingency tables. Biometrika 70 566-578. JSTOR: · Zbl 0549.62041
[4] Berkson, J. (1946). Limitations of the application of fourfold table analysis to hospital data. Biometrics Bull. 2 47-53.
[5] Breslow, N. E. (1996). Statistics in epidemiology: The case-control study. J. Amer. Statist. Assoc. 91 14-28. JSTOR: · Zbl 0870.62082
[6] Bishop, Y. M., Fienberg, S. and Holland, P. (1975). Discrete Multivariate Analysis . MIT Press, Cambridge, MA. · Zbl 0332.62039
[7] Cooper, G. F. (1995). Causal discovery from data in the presence of selection bias. Preliminary Papers of the 5th International Workshop on Artificial Intelligence and Statistics.
[8] Copas, J. B. and Li, H. G. (1997). Inference for non-random samples (with discussion). J. Roy. Statist. Soc. Ser. B 59 55-95. JSTOR: · Zbl 0896.62003
[9] Cox, D. R. and Wermuth, N. (1996). Multivariate Depencencies-Models, Analysis and Interpretation . Chapman and Hall, London. · Zbl 0880.62124
[10] Clayton, D. G. (2002). Models, parameters, and confounding in epidemiology. Invited Lecture, International Biometric Conference, Freiburg. Available at .
[11] Darroch, J. N., Lauritzen, S. L. and Speed, T. P. (1980). Markov fields and log linear models for contingency tables. Ann. Statist. 8 522-539. · Zbl 0444.62064
[12] Davis, J. A. (1984). Extending Rosenberg’s technique for standardizing percentage tables. Social Forces 62 679-708.
[13] Davis, L. J. (1986). Whittemore’s notion of collapsibility in multidimensional contingency tables. Comm. Statist. Theory Methods 15 2541-2554. · Zbl 0643.62042
[14] Dawid, A. P. (1979). Conditional independence in statistical theory (with discussion). J. Roy. Statist. Soc. Ser. B 41 1-31. JSTOR: · Zbl 0408.62004
[15] Dawid, A. P. (2002). Influence diagrams for causal modelling and inference. Int. Statist. Rev. 70 161-189. · Zbl 1215.62002
[16] Dawid, A. P. (2010). Beware of the DAG! J. Mach. Learn. 6 59-86.
[17] Dawid, A. P. and Didelez, V. (2010). Identifying the consequences of dynamic treatment strategies. A decision-theoretic overview. Statist. Surveys . · Zbl 1274.62072
[18] Didelez, V., Dawid, A. P. and Geneletti, S. (2006). Direct and indirect effects of sequential treatments. In Proceedings 22nd Conference on Uncertainty in Artificial Intelligence (R. Dechter and T. S. Richardson, eds.) 138-146. AUAI Press, Arlington, TX.
[19] Didelez, V. and Edwards, D. (2004). Collapsibility of graphical CG-regression models. Scand. J. Statist. 31 535-551. · Zbl 1062.62133
[20] Didelez, V. and Sheehan, N. (2007a). Mendelian randomisation as an instrumental variable approach to causal inference. Statist. Meth. Med. Res. 16 309-330. · Zbl 1122.62343
[21] Didelez, V. and Sheehan, N. (2007b). Mendelian randomisation: Why epidemiology needs a formal language for causality. In Causality and Probability in the Sciences (F. Russo and J. Williamson, eds.) 263-292. College Publications, London.
[22] Ducharme, G. R. and Lepage, Y. (1986). Testing collapsibility in contingency tables. J. Roy. Statist. Soc. Ser. B 48 197-205. JSTOR: · Zbl 0608.62063
[23] Edwards, A. W. F. (1963). The measure of association in a 2\times 2 table. J. Roy. Statist. Soc. Ser. A 126 109-114.
[24] Frydenberg, M. (1990). The chain graph Markov property. Scand. J. Statist. 17 333-353. · Zbl 0713.60013
[25] Geneletti, S. (2007). Identifying direct and indirect effects in a non-counterfactual framework. J. Roy. Statist. Soc. Ser. B 69 199-215. · Zbl 1120.62006
[26] Geneletti, S., Richardson, S. and Best, N. (2009). Adjusting for selection bias in retrospective case-control studies. Biostatistics 10 17-31.
[27] Geng, Z. (1992). Collapsibility of relative risk in contingency tables with a response variable. J. Roy. Statist. Soc. Ser. B 54 585-593. JSTOR: · Zbl 0774.62061
[28] Greenland, S. (2003). Quantifying biases in causal models: Classical confounding vs. collider-stratification bias. Epidemiology 14 300-306.
[29] Greenland, S., Pearl, J. and Robins, J. M. (1999a). Causal diagrams for epidemiologic research. Epidemiology 10 37-48.
[30] Greenland, S., Pearl, J. and Robins, J. M. (1999b). Confounding and collapsibility in causal inference. Statist. Sci. 14 29-46. · Zbl 1059.62506
[31] Guo, J., Geng, Z. and Fung, W.-K. (2001). Consecutive collapsibility of odds ratios over an ordinal background variable. J. Multivariate Anal. 79 89-98. · Zbl 1027.62042
[32] Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica 47 153-161. JSTOR: · Zbl 0392.62093
[33] Hernán, M. A., Hernández-Díaz, S. and Robins, J. M. (2004). A structural approach to selection bias. Epidemiology 15 615-625.
[34] Kim, S.-H. and Kim, S.-H. (2006). A note on collapsibility in DAG models of contingency tables. Scand. J. Statist. 33 575-590. · Zbl 1113.62070
[35] Kreiner, S. (1987). Analysis of multidimensional contingency tables by exact methods. Scand. J. Statist. 14 97-112. · Zbl 0636.62048
[36] Lauritzen, S. L. (1982). Lectures on Contingency Tables . Aalborg Univ. Press.
[37] Lauritzen, S. L. (1996). Graphical Models . Clarendon Press, Oxford. · Zbl 0907.62001
[38] Lauritzen, S. L. (2000). Causal inference from graphical models. In Complex Stochastic Systems (O. E. Barndorff-Nielsen, D. R. Cox and C. Klüppelberg, eds.) 63-107. Chapman and Hall/CRC Press, London. · Zbl 1010.62004
[39] Lauritzen, S. L., Dawid, A. P., Larsen, B. N. and Leimer, H. G. (1990). Independence properties of directed Markov fields. Networks 20 491-505. · Zbl 0743.05065
[40] Lauritzen, S. L. and Richardson, T. S. (2002). Chain graph models and their causal interpretations (with discussion). J. Roy. Statist. Soc. Ser. B 64 321-361. JSTOR: · Zbl 1090.62103
[41] Lauritzen, S. L. and Richardson, T. S. (2008). Discussion of McCullagh: Sampling bias and logistic models. J. Roy. Statist. Soc. Ser. B 70 671. · Zbl 05563363
[42] Mansson, R., Joffe, M. M., Sun, W. and Hennessy, S. (2007). On the estimation and use of propensity scores in case-control and case-cohort studies. Am. J. Epidemiol. 166 332-339.
[43] McCullagh, P. (2008). Sampling bias and logistic models. J. Roy. Statist. Soc. Ser. B 70 643-677. · Zbl 05563363
[44] Newman, S. C. (2006). Causal analysis of case-control data. Epidemiologic Perspectives and Innovations 3 2.
[45] Pearl, J. (1993). Graphical models, causality and interventions. Statist. Sci. 8 266-269.
[46] Pearl, J. (1995). Causal diagrams for empirical research. Biometrika 82 669-710. JSTOR: · Zbl 0860.62045
[47] Pearl, J. (2000). Causality-Models, Reasoning and Inference . Cambridge Univ. Press. · Zbl 0959.68116
[48] Pearl, J. (2001). Direct and indirect effects. In Proceedings 17th Conference on Uncertainty in Artificial Intelligence (J. Breese and D. Koller, eds.) 411-420. Morgan Kaufmann, San Francisco, CA.
[49] Pedersen, A. T., Lidegaard, O., Kreiner, S. and Ottesen, B. (1997). Hormone replacement therapy and risk of non-fatal stroke. The Lancet 350 1277-1283.
[50] Prentice, R. L. and Pyke, R. (1979). Logistic disease incidence models and case-control studies. Biometrika 66 403-411. JSTOR: · Zbl 0428.62078
[51] Robins, J. (1986). A new approach to causal inference in mortality studies with sustained exposure periods-application to control for the healthy worker survivor effect. Math. Model. 7 1393-1512. · Zbl 0614.62136
[52] Robins, J. M. (2001). Data, design, and background knowledge in etiologic inference. Epidemiology 12 313-320. · Zbl 0647.62093
[53] Robins, J. M. (2003). Semantics of causal DAG models and the identification of direct and indirect effects. In Highly Structured Stochastic Systems (P. Green, N. Hjort and S. Richardson, eds.) 70-81. Oxford Univ. Press.
[54] Robins, J. M., Hernan, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11 550-560. · Zbl 0647.62093
[55] Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 89 846-866. JSTOR: · Zbl 0815.62043
[56] Robinson, L. D. and Jewell, N. P. (1991). Some surprising results about covariate adjustment in logistic regression models. Int. Statist. Rev. 2 227-240. · Zbl 0742.62067
[57] Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41-55. JSTOR: · Zbl 0522.62091
[58] Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66 688-701.
[59] Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6 34-58. · Zbl 0383.62021
[60] Shapiro, S. H. (1982). Collapsing contingency tables: A geometric approach. Amer. Statist. 36 43-46. JSTOR:
[61] Slama, R., Ducot, B., Carstensen, L., Lorente, C., de La Rochebrochard, E., Leridon, H., Keiding, N. and Bouyer, J. (2006). Feasibility of the current duration approach to study human fecundity. Epidemiology 17 440-449.
[62] Spirtes, P., Glymour, C. and Scheines, R. (1993). Causation, Prediction, and Search , 1st ed. MIT Press, Cambridge, MA. · Zbl 0806.62001
[63] van der Laan, M. J. (2008). Estimation based on case-control designs with known prevalence probability. Int. J. Biostat. 4 1-57.
[64] Verma, T. and Pearl, J. (1988). Causal networks: Semantics and expressiveness. In Proceedings of the 4th Conference on Uncertainty and Artificial Intelligence (R. D. Shachter, T. S. Levitt, L. N. Kanal and J. F. Lemmer, eds.) 69-76. Elsevier, New York.
[65] Weinberg, C. R., Baird, D. D. and Rowland, A. S. (1993). Pitfalls inherent in retrospective time-to-event studies: The example of time to pregnancy. Statist. Med. 12 867-879.
[66] Wermuth, N. (1987). Parametric collapsibility and the lack of moderating effects in contingency tables with a dichotomous response variable. J. Roy. Statist. Soc. Ser. B 49 353-364. JSTOR: · Zbl 0636.62049
[67] Wermuth, N. and Lauritzen, S. (1990). On substantive research hypotheses, conditional independence graphs and graphical chain models (with discussion). J. Roy. Statist. Soc. Ser. B 52 21-72. JSTOR: · Zbl 0749.62004
[68] Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics . Wiley, Chichester. · Zbl 0732.62056
[69] Whittemore, A. S. (1978). Collapsibility of multidimensional contingency tables. J. Roy. Statist. Soc. Ser. B 40 328-340. JSTOR: · Zbl 0413.62041
[70] Xie, X. and Geng, Z. (2009). Collapsibility of directed acyclic graphs. Scand. J. Statist. 36 185-208. · Zbl 1196.62075
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.