zbMATH — the first resource for mathematics

Consistent model selection of discrete Bayesian networks from incomplete data. (English) Zbl 1336.62087
Summary: A maximum likelihood based model selection of discrete Bayesian networks is considered. The structure learning is performed by employing a scoring function \(S\), which, for a given network \(G\) and \(n\)-sample \(D_{n}\), is defined as the maximum marginal log-likelihood \(l\) minus a penalization term \(\lambda_{n}h\) proportional to network complexity \(h(G)\), \[ S(G|D_{n})=l(G|D_{n})-\lambda_{n}h(G). \] An available case analysis is developed with the standard log-likelihood replaced by the sum of sample average node log-likelihoods. The approach utilizes partially missing data records and allows for comparison of models fitted to different samples. In missing completely at random settings the estimation is shown to be consistent if and only if the sequence \(\lambda_{n}\) converges to zero at a slower than \(n^{-{1/2}}\) rate. In particular, the BIC model selection \((\lambda_{n}=0.5\log(n)/n)\) applied to the node-average log-likelihood is shown to be inconsistent in general. This is in contrast to the complete data case when BIC is known to be consistent. The conclusions are confirmed by numerical experiments.
62F12 Asymptotic properties of parametric estimators
62H12 Estimation in multivariate analysis
62F15 Bayesian inference
62-09 Graphical methods in statistics (MSC2010)
Full Text: DOI Euclid arXiv
[1] Balov, N., Salzman, P. (2013). catnet: Categorical Bayesian Network Inference. R package version 1.13.8.
[2] Beinlich, I., Suermondth, G., Chavez, R., Cooper, G. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks . In 2-nd European Conference on AI and Medicine.
[3] Buntine, W. (1996). A guide to the literature on learning graphical models . IEEE Transactions on Knowledge and Data Engineering, 8:195-210
[4] Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3:507-554. · Zbl 1084.68519
[5] Cooper, G., Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data . Machine Learning, 9(4):309-347. · Zbl 0766.68109
[6] Geiger, D., Heckerman, D., King, H. and Meek, C. (2001). Stratified exponential families: Graphical models and model selection . The Annals of Statistics, 29(2):505-529. · Zbl 1012.62012
[7] Haughton, Dominique M. A. (1988). On the choice of a model to fit data from an exponential family . The Annals of Statistics, 16(1):342-355. · Zbl 0657.62037
[8] Lauritzen, S.,L. (1995). The EM algorithm for graphical association models with missing data . Computational Statistics and Data Analysis, 19(2):191-201. · Zbl 0875.62237
[9] Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible reasoning . Morgan Kaufmann, San Mateo, CA. · Zbl 0746.68089
[10] Schwartz, G. (1978). Estimating the dimension of a model . The Annals of Statistics, 6(2):461-464. · Zbl 0379.62005
[11] Spiegelhalter, D., Dawid, A., Lauritzen, S., Cowell, R. (1993). Bayesian analysis in expert systems . Statistical Science, 8(3):219-247. · Zbl 0955.62523
[12] van der Vaart, A.W. (2007). Asymptotic Statistics . Cambridge University Press. · Zbl 0910.62001
[13] Verma, T. and Pearl, J. (1990). Equivalence and synthesis of causal models . In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, 255-268.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.