Reconstruction of a directed acyclic graph with intervention.(English)Zbl 1455.62031

Summary: Identification of causal relations among variables is central to many scientific investigations, as in regulatory network analysis of gene interactions and brain network analysis of effective connectivity of causal relations between regions of interest. Statistically, causal relations are often modeled by a directed acyclic graph (DAG), and hence that reconstruction of a DAG’s structure leads to the discovery of causal relations. Yet, reconstruction of a DAG’s structure from observational data is impossible because a DAG Gaussian model is usually not identifiable with unequal error variances. In this article, we reconstruct a DAG’s structure with the help of interventional data. Particularly, we construct a constrained likelihood to regularize intervention in addition to adjacency matrices to identify a DAG’s structure, subject to an error variance constraint to further reinforce the model identifiability. Theoretically, we show that the proposed constrained likelihood leads to identifiable models, thus correct reconstruction of a DAG’s structure through parameter estimation even with unequal error variances. Computationally, we design efficient algorithms for the proposed method. In simulations, we show that the proposed method enables to produce a higher accuracy of reconstruction with the help of interventional observations.

MSC:

 62A09 Graphical methods in statistics 62H22 Probabilistic graphical models 05C90 Applications of graph theory

Software:

 [1] Bhatia, R. (2013)., Matrix analysis 169. Springer Science & Business Media. [2] Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers., Foundations and Trends in Machine Learning 3 1-122. · Zbl 1229.90122 [3] Chickering, D. M. (2003). Optimal structure identification with greedy search., The Journal of Machine Learning Research 3 507-554. · Zbl 1084.68519 [4] Eaton, D. and Murphy, K. P. (2007). Exact Bayesian structure learning from uncertain interventions. In, International Conference on Artificial Intelligence and Statistics 107-114. [5] Edwards, D. (2000)., Introduction to graphical modelling. Springer Verlag. · Zbl 0952.62003 [6] Ellis, B. and Wong, W. H. (2008). Learning causal Bayesian network structures from experimental data., Journal of the American Statistical Association 103 778-789. · Zbl 1471.62056 [7] Friedman, N. (2004). Inferring cellular networks using probabilistic graphical models., Science Signalling 303 799. [8] Fu, F. and Zhou, Q. (2013). Learning sparse causal Gaussian networks with experimental intervention: regularization and coordinate descent., Journal of the American Statistical Association 108 288-300. · Zbl 06158343 [9] Hauser, A. and Bühlmann, P. (2012). Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs., The Journal of Machine Learning Research 13 2409-2464. · Zbl 1433.68346 [10] Huang, S., Li, J., Ye, J., Fleisher, A., Chen, K., Wu, T. and Reiman, E. (2013). A Sparse Structure Learning Algorithm for Gaussian Bayesian Network Identification from High-Dimensional Data., Pattern Analysis and Machine Intelligence, IEEE Transactions on 35 1328-1342. [11] Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm., The Journal of Machine Learning Research 8 613-636. · Zbl 1222.68229 [12] Kanehisa, M. and Goto, S. (2000). KEGG: kyoto encyclopedia of genes and genomes., Nucleic acids research 28 27-30. [13] Kolmogorov, A. N. and Tikhomirov, V. M. (1959). $$\varepsilon$$-entropy and $$\varepsilon$$-capacity of sets in function spaces., Uspekhi Matematicheskikh Nauk 14 3-86. · Zbl 0090.33503 [14] Nandy, P., Hauser, A., Maathuis, M. H. et al. (2018). High-dimensional consistency in score-based and hybrid structure learning., The Annals of Statistics 46 3151-3183. · Zbl 1411.62144 [15] Ossiander, M. (1987). A central limit theorem under metric entropy with $$L_2$$ bracketing., The Annals of Probability 15 897-919. · Zbl 0665.60036 [16] Pearl, J. (2003). Statistics and causal inference: A review., Test 12 281-345. · Zbl 1044.62003 [17] Peng, S., Shen, X. and Pan, W. (2019). intdag: Reconstruction of a Directed Acyclic Graph with Interventions R package version, 1.0.1. [18] Peters, J. and Bühlmann, P. (2013). Identifiability of Gaussian structural equation models with equal error variances., Biometrika, first published online, doi: 10.1093/biomet/ast043. · Zbl 1285.62005 [19] Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D. A. and Nolan, G. P. (2005). Causal protein-signaling networks derived from multiparameter single-cell data., Science 308 523. [20] Shen, X., Pan, W. and Zhu, Y. (2012). Likelihood-based selection and sharp parameter estimation., Journal of the American Statistical Association 107 223-232. · Zbl 1261.62020 [21] Shen, X., Pan, W., Zhu, Y. and Zhou, H. (2013). On constrained and regularized high-dimensional regression., Annals of the Institute of Statistical Mathematics 1-26. · Zbl 1329.62307 [22] Spirtes, P., Glymour, C. N. and Scheines, R. (2000)., Causation, prediction, and search 81. The MIT Press. · Zbl 0806.62001 [23] Tripathi, S., Christie, K. R., Balakrishnan, R., Huntley, R., Hill, D. P., Thommesen, L., Blake, J. A., Kuiper, M. and Lægreid, A. (2013). Gene Ontology annotation of sequence-specific DNA binding transcription factors: setting the stage for a large-scale curation effort., Database 2013. [24] Tsamardinos, I., Brown, L. E. and Aliferis, C. F. (2006). The max-min hill-climbing Bayesian network structure learning algorithm., Machine Learning 65 31-78. [25] Webster, J. A., Gibbs, J. R., Clarke, J., Ray, M., Zhang, W., Holmans, P., Rohrer, K., Zhao, A., Marlowe, L., Kaleem, M. et al. (2009). Genetic control of human brain transcript expression in Alzheimer disease., The American Journal of Human Genetics 84 445-458. [26] Wong, W. H. and Shen, X. (1995). Probability inequalities for likelihood ratios and convergence rates of sieve MLEs., The Annals of Statistics 23 339-362. · Zbl 0829.62002 [27] Yuan, Y., Shen, X., Pan, W. and Wang, Z. (2018). Constrained likelihood for reconstructing a directed acyclic Gaussian graph., Biometrika 106 109-125. · Zbl 07051930 [28] Zhou, S. · Zbl 0929.62052