×

zbMATH — the first resource for mathematics

Smooth, identifiable supermodels of discrete DAG models with latent variables. (English) Zbl 1431.62230
Summary: We provide a parameterization of the discrete nested Markov model, which is a supermodel that approximates DAG models (Bayesian network models) with latent variables. Such models are widely used in causal inference and machine learning. We explicitly evaluate their dimension, show that they are curved exponential families of distributions, and fit them to data. The parameterization avoids the irregularities and unidentifiability of latent variable models. The parameters used are all fully identifiable and causally-interpretable quantities.
MSC:
62H12 Estimation in multivariate analysis
62M45 Neural nets and related approaches to inference from stochastic processes
60J05 Discrete-time Markov processes on general state spaces
PDF BibTeX XML Cite
Full Text: DOI Euclid arXiv
References:
[1] Bishop, C.M. (2007). Pattern Recognition and Machine Learning. Information Science and Statistics. New York: Springer.
[2] Darwiche, A. (2009). Modeling and Reasoning with Bayesian Networks. Cambridge: Cambridge Univ. Press. · Zbl 1231.68003
[3] Dawid, A.P. (2002). Influence diagrams for causal modelling and inference. Int. Stat. Rev.70 161–189. · Zbl 1215.62002
[4] Drton, M. (2009). Discrete chain graph models. Bernoulli15 736–753. · Zbl 1452.62348
[5] Drton, M. (2009). Likelihood ratio tests and singularities. Ann. Statist.37 979–1012. · Zbl 1196.62020
[6] Drton, M. and Richardson, T.S. (2008). Binary models for marginal independence. J. R. Stat. Soc. Ser. B. Stat. Methodol.70 287–309. · Zbl 1148.62043
[7] Evans, R.J. (2018). Margins of discrete Bayesian networks. Ann. Statist.46 2623–2656. · Zbl 1408.62044
[8] Evans, R.J. and Richardson, T.S. (2010). Maximum likelihood fitting of acyclic directed mixed graphs to binary data. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence 177–184.
[9] Evans, R.J. and Richardson, T.S. (2013). Marginal log-linear parameters for graphical Markov models. J. R. Stat. Soc. Ser. B. Stat. Methodol.75 743–768.
[10] Evans, R.J. and Richardson, T.S. (2014). Markovian acyclic directed mixed graphs for discrete data. Ann. Statist.42 1452–1482. · Zbl 1302.62148
[11] Hauser, R.M., Sewell, W.H. and Herd, P. Wisconsin Longitudinal Study (WLS), 1957–2012. Available at http://www.ssc.wisc.edu/wlsresearch/documentation/. Version 13.03, Univ. Wisconsin–Madison, WLS.
[12] Huang, J.C. and Frey, B.J. (2008). Cumulative distribution networks and the derivative-sum-product algorithm. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence 290–297.
[13] Mond, D., Smith, J. and van Straten, D. (2003). Stochastic factorizations, sandwiched simplices and the topology of the space of explanations. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci.459 2821–2845. · Zbl 1051.60076
[14] Pearl, J. and Verma, T.S. (1992). A statistical semantics for causation. Stat. Comput.2 91–95.
[15] Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge: Cambridge Univ. Press. · Zbl 1188.68291
[16] Richardson, T. (2003). Markov properties for acyclic directed mixed graphs. Scand. J. Stat.30 145–157. · Zbl 1035.60005
[17] Richardson, T.S., Evans, R.J., Robins, J.M. and Shpitser, I. (2017). Nested Markov properties for acyclic directed mixed graphs. Preprint. Available at arXiv:1701.06686.
[18] Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period – Application to control of the healthy worker survivor effect. Math. Model.7 1393–1512. · Zbl 0614.62136
[19] Shpitser, I., Evans, R.J., Richardson, T.S. and Robins, J.M. (2013). Sparse nested Markov models with log-linear parameters. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence 576–585.
[20] Shpitser, I., Evans, R.J., Richardson, T.S. and Robins, J.M. (2014). Introduction to nested Markov models. Behaviormetrika41 3–39.
[21] Silva, R., Blundell, C. and Teh, Y.W. (2011). Mixed cumulative distribution networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS) 15 670–678.
[22] Silva, R. and Ghahramani, Z. (2009). The hidden life of latent variables: Bayesian learning with mixed graph models. J. Mach. Learn. Res.10 1187–1238. · Zbl 1235.68191
[23] Richardson, T.S. Spirtes, P.L. and (2002). Ancestral graph Markov models. Ann. Statist.30 962-1030. · Zbl 1033.60008
[24] Tian, J. and Pearl, J. (2002). A general identification condition for causal effects. In Proceedings of the 18th National Conference on Artificial Intelligence. AAAI.
[25] Tian, J. and Pearl, J. (2002). On the testable implications of causal models with hidden variables. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI-02) 519–527. Morgan Kaufmann Publishers Inc.
[26] Verma, T.S. and Pearl, J. (1991). Equivalence and synthesis of causal models. In Proceedings of the 7th Conference on Uncertainty in Artificial Intelligence (UAI-91) 255–268.
[27] Wermuth, N. (2011). Probability distributions with summary graph structure. Bernoulli17 845–879. · Zbl 1245.62062
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.