## Learning Bayesian network structures using weakest mutual-information-first strategy.(English)Zbl 1468.68165

Summary: In Bayesian network structure learning, the quality of the directed graph learned by the constraint-based approaches can be greatly affected by the order of choosing variable pairs and the order of selecting condition sets for testing conditional independence. Inspired by the strong connection between the degree of mutual information shared by two variables and their conditional independence, we introduce the M-ordering concept, where a matrix is precomputed from the observational data with variables ordered increasingly by their respective degree of mutual information with the target variable under concern. Given the M-ordering matrix, we propose a strategy called Weakest Mutual-Information-First Strategy (WMIF), which is integrated into the PC-algorithm in two aspects: an MI-based edge removal strategy, and an MI-based condition set generation strategy. The MI-based edge removal strategy is to always select the variable pair with the weakest mutual information to test their conditional independence; the condition set generation strategy is to construct a conditioning set where variables bearing a weaker degree of mutual information with the target variable are always considered first. We prove that the weakest MI-based edge removal strategy is sound, and our PC-MI algorithm, a PC variant empowered by the WMIF strategy, is order-independent. Moreover, in PC algorithms, the number of conditional independence tests increases exponentially with the number of random variables; we show that the WMIF strategy can effectively reduce the complexity (bounded by $$o(| \mathbf{V} |(2^{|\operatorname{adj}(X)|} - \frac{| \mathbf{V} |^2}{2})))$$. We have conducted experiments with both low-dimensional and high-dimensional data sets, and the results indicate that PC-MI outperforms the state-of-the-art approaches. More importantly, the order-agnostic property of PC-MI can be extremely useful when it is hard to prescribe a meaningful variable ordering as needed in some other PC algorithms.

### MSC:

 68T05 Learning and adaptive systems in artificial intelligence 60J20 Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) 62H22 Probabilistic graphical models