# zbMATH — the first resource for mathematics

Inferring large graphs using $$\ell_1$$-penalized likelihood. (English) Zbl 1384.62177
Stat. Comput. 28, No. 4, 905-921 (2018); correction ibid. 28, No. 6, 1231 (2018).
Summary: We address the issue of recovering the structure of large sparse directed acyclic graphs from noisy observations of the system. We propose a novel procedure based on a specific formulation of the $$\ell_1$$-norm regularized maximum likelihood, which decomposes the graph estimation into two optimization sub-problems: topological structure and node order learning. We provide convergence inequalities for the graph estimator, as well as an algorithm to solve the induced optimization problem, in the form of a convex program embedded in a genetic algorithm. We apply our method to various data sets (including data from the DREAM4 challenge) and show that it compares favorably to state-of-the-art methods. This algorithm is available on CRAN as the R package GADAG.

##### MSC:
 62H12 Estimation in multivariate analysis 62J05 Linear regression; mixed models 62J07 Ridge regression; shrinkage estimators (Lasso) 68T05 Learning and adaptive systems in artificial intelligence 90C90 Applications of mathematical programming
##### Keywords:
directed acyclic graphs; Lasso; convex program; optimization
##### Software:
 [1] Alander, J.: On optimal population size of genetic algorithms. In: Proceedings of the IEEE Computer Systems and Software Engineering (1992) · Zbl 1233.62050 [2] Allouche, A., Cierco-Ayrolles, C., de Givry, S., Guillermin, G., Mangin, B., Schiex, T., Vandel, J., Vignes, M.: A panel of learning methods for the reconstruction of gene regulatory networks in a systems genetics. In: de la fuente, A. (ed.) Gene Network Inference, Verification of Methods for Systems Genetics Data, pp. 9-32 . Springer, Heidelberg (2013) · Zbl 1233.62050 [3] Aragam, B., Amini, A., Zhou, Q.: Learning directed acyclic graphs with penalized neighbourhood regression (2015) Preprint on arxiv at https://arxiv.org/pdf/1511.08963.pdf · Zbl 1142.62408 [4] Bach, F.: Bolasso: model consistent lasso estimation through the bootstrap. In: Proceedings of the Twenty-fifth International Conference on Machine Learning (2008) · Zbl 1142.62408 [5] Banerjee, O; Ghaoui, L; d’Aspremont, A, Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data, J. Mach. Learn. Res., 9, 485-516, (2008) · Zbl 1225.68149 [6] Barabási, A; Oltvai, Z, Network biology: understanding the cell’s functional organization, Nat. Rev. Genet., 5, 101-113, (2004) [7] Bickel, P; Li, B, Regularization in statistics, Test, 15, 271-344, (2006) · Zbl 1110.62051 [8] Bickel, PJ; Ritov, Y; Tsybakov, AB, Simultaneous analysis of lasso and Dantzig selector, Ann. Stat., 37, 1705-1732, (2009) · Zbl 1173.62022 [9] Breiman, L, Random forests, Mach. Learn., 45, 5-32, (2001) · Zbl 1007.68152 [10] Bühlmann, P, Causal statistical inference in high dimensions, Math. Methods Oper. Res., 77, 357-370, (2013) · Zbl 1339.62001 [11] Bühlmann, P., van de Geer, S.: Statistics for high-dimensional data. Methods, theory and applications, Springer Series in Statistics. Springer, Heidelberg (2011) · Zbl 1334.62120 [12] Cerf, R, Asymptotic convergence of genetic algorithms, Adv. Appl. Probab., 30, 521-550, (1998) · Zbl 0911.60018 [13] Champion, M., Picheny, V., Vignes, M.: GADAG: A Genetic Algorithm for learning Directed Acyclic Graphs. R package version 0.99.0 (2017) [14] Chen, J; Chen, Z, Extended Bayesian information criteria for model selection with large model spaces, Biometrika, 95, 759-771, (2008) · Zbl 1437.62415 [15] Chickering, D. M.: Learning Bayesian networks is NP-complete. In: Learning from data (Fort Lauderdale, FL, 1995), vol. 112 of Lecture Notes in Statistics, pp. 121-130. Springer, New York (1996) · Zbl 1173.62022 [16] Chickering, DM, Optimal structure identification with greedy search, J. Mach. Learn. Res., 3, 507-554, (2002) · Zbl 1084.68519 [17] Cook, S, A taxonomy of problems with fast parallel algorithms, Inform. Control, 64, 2-22, (1985) · Zbl 0575.68045 [18] Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. Introduction to algorithms, 2nd edn. MIT Press, Cambridge, MA; McGraw-Hill Book Co., Boston, MA (2001) · Zbl 1047.68161 [19] Csiszár, I., Tusnády, G.: Information geometry and alternating minimization procedures. Statistics and Decisions, pp. 205-237. Recent results in estimation theory and related topics (1984) · Zbl 1318.68151 [20] Davis, L. (ed.): Handbook of Genetic Algorithms. Van Nostrand Reinhold, Hoboken (1991) · Zbl 1367.94106 [21] Smet, R; Marchal, K, Advantages and limitations of current network inference methods, Nat Rev Microbiol, 8, 717-729, (2010) [22] Dondelinger, F; Lèbre, S; Husmeier, D, Non-homogeneous dynamic Bayesian networks with Bayesian regularization for inferring gene regulatory networks with gradually time-varying structure, Mach. Learn., 90, 191-230, (2013) · Zbl 1260.92027 [23] Dréo, J., Pétrowski, A., Siarry, P., Taillard, E.A.: Metaheuristics for hard optimization. Methods and case studies, Translated from the 2003 French original by Amitava Chatterjee. Springer, Berlin (2006) · Zbl 1007.68152 [24] Duchi, J., Gould, S., Koller, D.: Projected subgradient methods for learning sparse gaussians. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (2008) [25] Efron, B; Hastie, T; Johnstone, I; Tibshirani, R, Least angle regression, Ann. Stat., 32, 407-499, (2004) · Zbl 1091.62054 [26] Karoui, N, Spectrum estimation for large dimensional covariance matrices using random matrix theory, Ann. Stat., 36, 2757-2790, (2008) · Zbl 1168.62052 [27] Ellis, B; Wong, W, Learning causal Bayesian network structures from experimental data, J. Am. Stat. Assoc., 103, 778-789, (2008) · Zbl 05564531 [28] Evtushenko, Y; Malkova, V; Stanevichyus, A, Parallel global optimization of functions of several variables, Comput. Math. Math. Phys., 49, 246-260, (2009) · Zbl 1199.65197 [29] Friedman, N; Koller, D, Being Bayesian about network structure: a Bayesian approach to structure discovery in Bayesian networks, Mach. Learn., 50, 95-125, (2003) · Zbl 1033.68104 [30] Friedman, J; Hastie, T; Tibshirani, R, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9, 432-441, (2007) · Zbl 1143.62076 [31] Fu, F; Zhou, Q, Learning sparse causal Gaussian networks with experimental intervention: regularization and coordinate descent, J. Am. Stat. Assoc., 108, 288-300, (2013) · Zbl 06158343 [32] Giraud, C.: Introduction to high-dimensional statistics, vol. 139 of Monographs on Statistics and Applied Probability. CRC Press, Boca Raton (2015) · Zbl 1341.62011 [33] Granville, V; Krivanek, M; Rasson, J-P, Simulated annealing: a proof of convergence, IEEE. Trans. Pattern Anal., 16, 652-656, (1994) [34] Grefenstette, J., Gopal, R., Rosmaita, B., Van Gucht, D. (1985). Genetic algorithms for the traveling salesman problem. In: Proceedings of the first International Conference on Genetic Algorithms and their Applications · Zbl 0674.90096 [35] Grzegorczyk, M; Husmeier, D, Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move, Mach. Learn., 71, 265-305, (2008) [36] Guyon, I., Aliferis, C., Cooper, G.: Causation and Prediction Challenge, vol. 2. Challenges in Machine Learning. Microtome Publishing, Brookline (2010) [37] Hauser, A; Bühlmann, P, Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs, J. Mach. Learn. Res., 13, 2409-2464, (2012) · Zbl 1433.68346 [38] Hauser, A; Bühlmann, P, Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs, J. R. Stat. Soc. Ser. B, 77, 291-318, (2015) · Zbl 1414.62021 [39] Hogben, L. (ed.): Handbook of linear algebra. Discrete Mathematics and its Applications (Boca Raton). Associate editors: Richard Brualdi, Anne Greenbaum and Roy Mathias. Chapman & Hall/CRC, Boca Raton (2007) · Zbl 0850.62538 [40] Holland, J.H.: Adaptation in Natural and Artificial Systems. An introductory analysis with applications to biology, control, and artificial intelligence. University of Michigan Press, Ann Arbor (1975) · Zbl 0317.68006 [41] Horst, R., Pardalos, P.M. (eds.): Handbook of global optimization. Nonconvex optimization and its applications. Kluwer Academic Publishers, Dordrecht (1995) [42] Hoyle, R. (ed.): Structural Equation Modeling. Sage Publ, Thousand Oaks (1995) [43] Husmeier, D; Werhli, A, Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks with Bayesian networks, Comput. Syst. Bioinform., 6, 85-95, (2007) · Zbl 1166.62373 [44] Huynh-Thu, VA; Irrthum, A; Wehenkel, L; Geurts, P, Inferring regulatory networks from expression data using tree-based methods, PLoS ONE, 5, e12776, (2010) [45] Jones, DR; Perttunen, CD; Stuckman, BE, Lipschitzian optimization without the Lipschitz constant, J. Optim. Theory Appl., 79, 157-181, (1993) · Zbl 0796.49032 [46] Kahn, AB, Topological sorting of large networks, Commun. ACM, 5, 558-562, (1962) · Zbl 0106.32602 [47] Kahre, K; Oh, S; Rajaratnam, B, A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees, JRSS B, 77, 803-825, (2015) · Zbl 1414.62183 [48] Kalisch, M; Bühlmann, P, Estimating high-dimensional directed acyclic graphs with the PC-algorithm, J. Mach. Learn. Res., 8, 613-636, (2007) · Zbl 1222.68229 [49] Koivisto, M; Sood, K, Exact Bayesian structure discovery in Bayesian networks, J. Mach. Learn. Res., 5, 549-573, (2004) · Zbl 1222.68234 [50] Ledoit, O; Wolf, M, Spectrum estimation: a unified framework for covariance matrix estimation and pca in large dimensions, J. Multivariate Anal., 139, 360-384, (2015) · Zbl 1328.62340 [51] Liaw, A; Wiener, M, Classification and regression by randomforest, R News, 2, 18-22, (2002) [52] Liu, H; Wang, L; Zhao, T, Sparse covariance matrix estimation with eigenvalue constraints, J Comput Graph Stat, 23, 439-459, (2014) [53] Lounici, K., Pontil, M., Tsybakov, A., van de Geer, S.: Taking advantage of sparsity in multi-task learning. In: Proceedings of the 22nd Conference on Learning Theory (2009) [54] Maathuis, M; Colombo, D; Kalisch, M; Bühlmann, P, Predicting causal effects in large-scale systems from observational data, Nat. Methods, 7, 247-248, (2010) [55] Marbach, D; Schaffter, T; Mattiussi, C; Floreano, D, Generating realistic in silico gene networks for performance assessment of reverse engineering methods, J. Comput. Biol., 16, 229-239, (2009) [56] Margolin, A; Nemenman, I; Basso, K; Wiggins, C; Stolovitzky, G; Dalla Favera, R; Califano, A, Aracne: an algorithm for the reconstruction of gene regulatory networks in a Mammalian cellular context, BMC Bioinform., 7, s7, (2006) [57] Meinshausen, N; Bühlmann, P, High-dimensional graphs and variable selection with the lasso, Ann. Stat., 34, 1436-1462, (2006) · Zbl 1113.62082 [58] Mestre, X, Improved estimation of eigenvalues and eigenvectors of covariance matrices using their sample estimates, IEEE Trans. Inform. Theory, 54, 5113-5129, (2008) · Zbl 1318.62191 [59] Michalewicz, Z.: Genetic Algorithms $$+$$ data structures $$=$$ Evolution Programs, 2nd edn. Springer, Berlin (1994) · Zbl 0818.68017 [60] Mordelet, F; Vert, J, Sirene: supervised inference of regulatory networks, Bioinformatics, 24, i76-i82, (2008) [61] Newman, M, The structure and function of complex networks, SIAM Rev., 45, 157-256, (2003) · Zbl 1029.68010 [62] Pearl, J.: Causality, Models, Reasoning, and Inference, 2nd edn. Cambridge University Press, Cambridge (2009) · Zbl 1188.68291 [63] Perrin, B., Ralaivola, L., Mazurie, A., Bottani, S., Mallet, J., d’Alché Buc, F.: Gene networks inference using dynamic bayesian networks. Bioinformatics 19Supp2, ii138-ii148 (2003) · Zbl 1222.68229 [64] Peters, J., Mooij, J. M., Janzing, D., Schölkopf, B.: Identifiability of causal graphs using functional models. In: 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011) (2011) [65] Peters, J; Mooij, J; Janzing, D; Schölkopf, B, Causal discovery with continuous additive noise models, J. Mach. Learn. Res., 15, 2009-2053, (2014) · Zbl 1318.68151 [66] Piszcz, A., Soule, T.: Genetic programming: Optimal population sizes for varying complexity problem. In: Proceedings of the Genetic and Evolutionary Computation Conference (2006) · Zbl 1437.62415 [67] Core Team, R.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2017) [68] Rau, A; Jaffrézic, F; Nuel, G, Joint estimation of causal effects from observational and intervention gene expression data, BMC. Syst. Biol., 7, 111, (2013) [69] Richardson, T, A characterization of Markov equivalence for directed cyclic graphs, Int. J. Approx. Reason., 17, 107-162, (1997) · Zbl 0939.68095 [70] Ridge, E.: Design of Experiments for the Tuning of Optimisation Algorithm. Ph.D. thesis, The University of York, Department of Computer Science (2007) · Zbl 1144.68305 [71] Robinson, R.W.: Counting labeled acyclic digraphs, pp. 239-273 (1973) · Zbl 0259.05116 [72] Sachs, K., Itani, S., Fitzegerald, J., Wille, L., Schoeberl, B., Dahleh, M., Nolan, G.: Learning cyclic signaling pathway structures while minimizing data requirements. In: Proceedings of the Pacific Symposium on Biocomputing (2009) · Zbl 05564531 [73] Schaffer, J.D., Caruana, R., Eshelman, L.J., Das, R.: A study of control parameters affecting online performance of genetic algorithms for function optimization. In: Proceedings of the Third International Conference on Genetic Algorithms (1989) · Zbl 1033.68104 [74] Schwarz, GE, Estimating the dimension of a model, Ann. Stat., 6, 461-464, (1978) · Zbl 0379.62005 [75] Sergeyev, Y, An information global optimization algorithm with local tuning, SIAM J. Optim., 5, 858-870, (1995) · Zbl 0847.90128 [76] Sergeyev, Y; Kvasov, D, Global search based on efficient diagonal partitions and a set of Lipschitz constants, SIAM J. Optim., 16, 910-937, (2006) · Zbl 1097.65068 [77] Shojaie, A; Jauhiainen, A; Kallitsis, M; Michailidis, G, Inferring regulatory networks by combining perturbation screens and steady state gene expression profiles, PLoS ONE, 9, e82393, (2014) [78] Shojaie, A; Michailidis, G, Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs, Biometrika, 97, 519-538, (2010) · Zbl 1195.62090 [79] Silander, T., Myllymäki, T.: A simple approach for finding the globally optimal bayesian network structure. In: Proceedings of the Twenty-second Conference on Uncertainty in Artificial Intelligence (2006) [80] Souma, W., Fujiwara, Y., Aoyama, H.: Heterogeneous economic networks. In The complex networks of economic interactions, vol. 567 of Lecture Notes in Economics and Mathematics Systems, pp. 79-92. Springer, Berlin (2006) · Zbl 1183.91139 [81] Spirtes, P.: Directed cyclic graphical representations of feedback models. In: Proceedings of the 11th Annual Conference on Uncertainty in Artificial Intelligence (1995) [82] Spirtes, P., Glymour, C., Scheines, R.: Causation, prediction, and search. Adaptive Computation and Machine Learning, 2nd edn. With additional material by David Heckerman, Christopher Meek, Gregory F. Cooper and Thomas Richardson, A Bradford Book. Cambridge: MIT Press (2000) · Zbl 0806.62001 [83] Tibshirani, R, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B, 58, 267-288, (1996) · Zbl 0850.62538 [84] Tsamardinos, I; Brown, L; Aliferis, C, The MAX-MIN Hill-climbing Bayesian network structure learning algorithm, Mach. Learn., 65, 31-78, (2006) [85] Geer, S; Bühlmann, P, $$ℓ _0$$-penalized maximum likelihood for sparse directed acyclic graphs, Ann. Stat., 41, 536-567, (2013) · Zbl 1267.62037 [86] Verma, T., Pearl, J.: Equivalence and synthesis of causal models. In: Proceedings of the 6th Annual Conference on Uncertainty in Artificial Intelligence (1991) [87] Verma, T., Araújo, N.A.M., Herrmann, H.J.: Revealing the structure of the world airline network. Sci. Rep. 4, 5638 (2014) · Zbl 1029.68010 [88] Verzelen, N, Minimax risks for sparse regressions: ultra-high dimensional phenomenons, Electron. J. Stat., 6, 38-90, (2012) · Zbl 1334.62120 [89] Wainwright, M, Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting, IEEE Trans. Inform. Theory, 55, 5728-5741, (2009) · Zbl 1367.94106 [90] Witten, D; Freidman, J; Simon, N, New insights and faster computations for the graphical lasso, J. Comput. Graph. Stat., 20, 892-900, (2011) [91] Wright, S, Correlation and causation, J. Agric. Res., 20, 558-585, (1921) [92] Yuan, M; Lin, Y, Model selection and estimation in the Gaussian graphical model, Biometrika, 94, 19-35, (2007) · Zbl 1142.62408 [93] Zhou, Q, Multi-domain sampling with applications to structural inference of Bayesian networks, J. Am. Stat. Assoc., 106, 1317-1330, (2011) · Zbl 1233.62050