Parameter priors for directed acyclic graphical models and the characterization of several probability distributions. (English) Zbl 1016.62064

Summary: We develop simple methods for constructing parameter priors for model choice among directed acyclic graphical (DAG) models. In particular, we introduce several assumptions that permit the construction of parameter priors for a large number of DAG models from a small set of assessments. We then present a method for directly computing the marginal likelihood of every DAG model given a random sample with no missing observations. We apply this methodology to Gaussian DAG models which consist of a recursive set of linear regression models. We show that the only parameter prior for complete Gaussian DAG models that satisfies our assumptions is the normal-Wishart distribution.
Our analysis is based on the following new characterization of the Wishart distribution: let \(W\) be an \(n\times n\), \(n\geq 3\), positive definite symmetric matrix of random variables and \(f(W)\) be a pdf of \(W\). Then, \(f(W)\) is a Wishart distribution if and only if \(W_{11}-W_{12} W_{22}^{-1}W_{12}'\) is independent of \(\{W_{12},W_{22}\}\) for every block partitioning \(W_{11},W_{12}, W_{12}'\), \(W_{22}\) of \(W\). Similar characterizations of the normal and normal-Wishart distributions are provided as well.


62H05 Characterization and structure theory for multivariate probability distributions; copulas
05C90 Applications of graph theory
60E05 Probability distributions: general theory
62C10 Bayesian problems; characterization of Bayes procedures
39B99 Functional equations and inequalities
05C20 Directed graphs (digraphs), tournaments


Full Text: DOI arXiv


[1] ACZÉL, J. (1966). Lectures on Functional Equations and Their Applications. Academic Press, New York. · Zbl 0139.09301
[2] ANDERSSON, S. A., MADIGAN, D. and PERLMAN, M. D. (1997). A characterization of Markov equivalence classes for acy clic digraphs. Ann. Statist. 25 505-541. · Zbl 0876.60095
[3] BERNARDO, J. M. and SMITH, A. F. M. (1994). Bayesian Theory. Wiley, New York. · Zbl 0796.62002
[4] BUNTINE, W. (1994). Operations for learning with graphical models. J. Artificial Intelligence Research 2 159-225.
[5] CHICKERING, D. (1995). A transformational characterization of equivalent Bayesian network structures. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal 87-98. Morgan Kaufmann, San Francisco.
[6] CHICKERING, D. (1996). Learning Bayesian networks from data. Ph.D. dissertation, Univ. California, Los Angeles.
[7] COOPER, G. and HERSKOVITS, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9 309-347. · Zbl 0766.68109
[8] COWELL, R., DAWID, A. P., LAURITZEN, S. and SPIEGELHALTER, D. (1999). Probabilistic Networks and Expert Sy stems. Springer, New York.
[9] DAWID, A. P. and LAURITZEN, S. (1993). Hy per-Markov laws in the statistical analysis of decomposable graphical models. Ann. Statist. 21 1272-1317. · Zbl 0815.62038
[10] DEGROOT, M. (1970). Optimal Statistical Decisions. McGraw-Hill, New York. · Zbl 0225.62006
[11] FRIEDMAN, N. and GOLDSZMIDT, M. (1997). Sequential update of Bayesian network structures. In Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence 165-174. Morgan Kaufmann, Providence, RI.
[12] GEIGER, D. and HECKERMAN, D. (1994). Learning Gaussian networks. In Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence 235-243. Morgan Kaufmann, San Francisco.
[13] GEIGER, D. and HECKERMAN, D. (1997). A characterization of the Dirichlet distribution through global and local parameter independence. Ann. Statist. 25 1344-1369. · Zbl 0885.62009
[14] GEIGER, D. and HECKERMAN, D. (1998). A characterization of the bivariate Wishart distribution. Probab. Math. Statist. 18 119-131. · Zbl 0981.62042
[15] GEIGER, D. and HECKERMAN, D. (1999). Parameter priors for directed graphical models and the characterization of several probability distributions. In Proceedings of Fifteenth Conference on Uncertainty in Artificial Intelligence 216-225. Morgan Kaufmann, San Francisco. · Zbl 1016.62064
[16] HECKERMAN, D. and GEIGER, D. (1995). Learning Bayesian networks: A unification for discrete and Gaussian domains. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence 274-284. Morgan Kaufmann, San Francisco.
[17] HECKERMAN, D., GEIGER, D. and CHICKERING, D. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20 197-243. · Zbl 0831.68096
[18] HECKERMAN, D., MAMDANI, A. and WELLMAN, M. (1995). Real-world applications of Bayesian networks. Comm. ACM 38.
[19] HOWARD, R. and MATHESON, J. (1981). Influence diagrams. In The Principles and Applications of Decision Analy sis 2 (R. Howard and J. Matheson, eds.) 721-762. Strategic Decisions Group, Menlo Park, CA.
[20] JÁRAI, A. (1986). On regular solutions of functional equations. Aequationes Math. 30 21-54. · Zbl 0589.39012
[21] JÁRAI, A. (1998). Regularity property of the functional equation of the Dirichlet distribution. Aequationes Math. 56 37-46. · Zbl 0914.39029
[22] KADANE, J. B., DICKEY, J. M., WINKLER, R. L., SMITH, W. S. and PETERS, S. C. (1980). Interactive elicitation of opinion for a normal linear model. J. Amer. Statist. Assoc. 75 845-854.
[23] KAGAN, A. M., LINNIK, Y. V. and RAO, C. R. (1973). Characterization Problems in Mathematical Statistics. Wiley, New York. · Zbl 0271.62002
[24] MADIGAN, D., ANDERSSON, S. A., PERLMAN, M. D. and VOLINSKY, C. T. (1996). Bayesian model averaging and model selection for Markov equivalence classes of acy clic digraphs. Comm. Statist. Theory Methods 25 2493-2519. · Zbl 0894.62032
[25] PEARL, J. (1988). Probabilistic Reasoning in Intelligent Sy stems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA. · Zbl 0746.68089
[26] PRESS, J. S. (1972). Applied Multivariate Analy sis. Holt, Rinehart and Winston, New York.
[27] SHACHTER, R. and KENLEY, C. (1989). Gaussian influence diagrams. Management Sci. 35 527- 550.
[28] SPIEGELHALTER, D., DAWID, A., LAURITZEN, S. and COWELL, R. (1993). Bayesian analysis in expert sy stems (with discussion). Statist. Sci. 8 219-283. · Zbl 0955.62523
[29] SPIEGELHALTER, D. and LAURITZEN, S. (1990). Sequential updating of conditional probabilities on directed graphical structures. Networks 20 579-605. · Zbl 0697.90045
[30] SPIRTES, P., GLy MOUR, C. and SCHEINES, R. (2001). Causation, Prediction, and Search. MIT Press.
[31] SPIRTES, P. and MEEK, C. (1995). Learning Bayesian networks with discrete variables from data. In Proceedings of First International Conference on Knowledge Discovery and Data Mining 294-299. Morgan Kaufmann, San Francisco.
[32] THIESSON, B., MEEK, C., CHICKERING, D. and HECKERMAN, D. (1998). Computationally efficient methods for selecting among mixtures of graphical models. In Bayesian Statistics 6 (J. M. Bernardo, A. P. Dawid and A. F. M. Smith, eds.) 631-656. Clarendon Press, Oxford. · Zbl 0974.62027
[33] VERMA, T. and PEARL, J. (1990). Equivalence and sy nthesis of causal models. In Proceedings of Sixth Conference on Uncertainty in Artificial Intelligence 220-227. Morgan Kaufmann, San Francisco.
[34] REDMOND, WASHINGTON 98052-6399 E-MAIL: heckerma@microsoft.com
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.