×

zbMATH — the first resource for mathematics

Inferring large graphs using \(\ell_1\)-penalized likelihood. (English) Zbl 1384.62177
Stat. Comput. 28, No. 4, 905-921 (2018); correction ibid. 28, No. 6, 1231 (2018).
Summary: We address the issue of recovering the structure of large sparse directed acyclic graphs from noisy observations of the system. We propose a novel procedure based on a specific formulation of the \(\ell_1\)-norm regularized maximum likelihood, which decomposes the graph estimation into two optimization sub-problems: topological structure and node order learning. We provide convergence inequalities for the graph estimator, as well as an algorithm to solve the induced optimization problem, in the form of a convex program embedded in a genetic algorithm. We apply our method to various data sets (including data from the DREAM4 challenge) and show that it compares favorably to state-of-the-art methods. This algorithm is available on CRAN as the R package GADAG.

MSC:
62H12 Estimation in multivariate analysis
62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
68T05 Learning and adaptive systems in artificial intelligence
90C90 Applications of mathematical programming
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Alander, J.: On optimal population size of genetic algorithms. In: Proceedings of the IEEE Computer Systems and Software Engineering (1992) · Zbl 1233.62050
[2] Allouche, A., Cierco-Ayrolles, C., de Givry, S., Guillermin, G., Mangin, B., Schiex, T., Vandel, J., Vignes, M.: A panel of learning methods for the reconstruction of gene regulatory networks in a systems genetics. In: de la fuente, A. (ed.) Gene Network Inference, Verification of Methods for Systems Genetics Data, pp. 9-32 . Springer, Heidelberg (2013) · Zbl 1233.62050
[3] Aragam, B., Amini, A., Zhou, Q.: Learning directed acyclic graphs with penalized neighbourhood regression (2015) Preprint on arxiv at https://arxiv.org/pdf/1511.08963.pdf · Zbl 1142.62408
[4] Bach, F.: Bolasso: model consistent lasso estimation through the bootstrap. In: Proceedings of the Twenty-fifth International Conference on Machine Learning (2008) · Zbl 1142.62408
[5] Banerjee, O; Ghaoui, L; d’Aspremont, A, Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data, J. Mach. Learn. Res., 9, 485-516, (2008) · Zbl 1225.68149
[6] Barabási, A; Oltvai, Z, Network biology: understanding the cell’s functional organization, Nat. Rev. Genet., 5, 101-113, (2004)
[7] Bickel, P; Li, B, Regularization in statistics, Test, 15, 271-344, (2006) · Zbl 1110.62051
[8] Bickel, PJ; Ritov, Y; Tsybakov, AB, Simultaneous analysis of lasso and Dantzig selector, Ann. Stat., 37, 1705-1732, (2009) · Zbl 1173.62022
[9] Breiman, L, Random forests, Mach. Learn., 45, 5-32, (2001) · Zbl 1007.68152
[10] Bühlmann, P, Causal statistical inference in high dimensions, Math. Methods Oper. Res., 77, 357-370, (2013) · Zbl 1339.62001
[11] Bühlmann, P., van de Geer, S.: Statistics for high-dimensional data. Methods, theory and applications, Springer Series in Statistics. Springer, Heidelberg (2011) · Zbl 1334.62120
[12] Cerf, R, Asymptotic convergence of genetic algorithms, Adv. Appl. Probab., 30, 521-550, (1998) · Zbl 0911.60018
[13] Champion, M., Picheny, V., Vignes, M.: GADAG: A Genetic Algorithm for learning Directed Acyclic Graphs. R package version 0.99.0 (2017)
[14] Chen, J; Chen, Z, Extended Bayesian information criteria for model selection with large model spaces, Biometrika, 95, 759-771, (2008) · Zbl 1437.62415
[15] Chickering, D. M.: Learning Bayesian networks is NP-complete. In: Learning from data (Fort Lauderdale, FL, 1995), vol. 112 of Lecture Notes in Statistics, pp. 121-130. Springer, New York (1996) · Zbl 1173.62022
[16] Chickering, DM, Optimal structure identification with greedy search, J. Mach. Learn. Res., 3, 507-554, (2002) · Zbl 1084.68519
[17] Cook, S, A taxonomy of problems with fast parallel algorithms, Inform. Control, 64, 2-22, (1985) · Zbl 0575.68045
[18] Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. Introduction to algorithms, 2nd edn. MIT Press, Cambridge, MA; McGraw-Hill Book Co., Boston, MA (2001) · Zbl 1047.68161
[19] Csiszár, I., Tusnády, G.: Information geometry and alternating minimization procedures. Statistics and Decisions, pp. 205-237. Recent results in estimation theory and related topics (1984) · Zbl 1318.68151
[20] Davis, L. (ed.): Handbook of Genetic Algorithms. Van Nostrand Reinhold, Hoboken (1991) · Zbl 1367.94106
[21] Smet, R; Marchal, K, Advantages and limitations of current network inference methods, Nat Rev Microbiol, 8, 717-729, (2010)
[22] Dondelinger, F; Lèbre, S; Husmeier, D, Non-homogeneous dynamic Bayesian networks with Bayesian regularization for inferring gene regulatory networks with gradually time-varying structure, Mach. Learn., 90, 191-230, (2013) · Zbl 1260.92027
[23] Dréo, J., Pétrowski, A., Siarry, P., Taillard, E.A.: Metaheuristics for hard optimization. Methods and case studies, Translated from the 2003 French original by Amitava Chatterjee. Springer, Berlin (2006) · Zbl 1007.68152
[24] Duchi, J., Gould, S., Koller, D.: Projected subgradient methods for learning sparse gaussians. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (2008)
[25] Efron, B; Hastie, T; Johnstone, I; Tibshirani, R, Least angle regression, Ann. Stat., 32, 407-499, (2004) · Zbl 1091.62054
[26] Karoui, N, Spectrum estimation for large dimensional covariance matrices using random matrix theory, Ann. Stat., 36, 2757-2790, (2008) · Zbl 1168.62052
[27] Ellis, B; Wong, W, Learning causal Bayesian network structures from experimental data, J. Am. Stat. Assoc., 103, 778-789, (2008) · Zbl 05564531
[28] Evtushenko, Y; Malkova, V; Stanevichyus, A, Parallel global optimization of functions of several variables, Comput. Math. Math. Phys., 49, 246-260, (2009) · Zbl 1199.65197
[29] Friedman, N; Koller, D, Being Bayesian about network structure: a Bayesian approach to structure discovery in Bayesian networks, Mach. Learn., 50, 95-125, (2003) · Zbl 1033.68104
[30] Friedman, J; Hastie, T; Tibshirani, R, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9, 432-441, (2007) · Zbl 1143.62076
[31] Fu, F; Zhou, Q, Learning sparse causal Gaussian networks with experimental intervention: regularization and coordinate descent, J. Am. Stat. Assoc., 108, 288-300, (2013) · Zbl 06158343
[32] Giraud, C.: Introduction to high-dimensional statistics, vol. 139 of Monographs on Statistics and Applied Probability. CRC Press, Boca Raton (2015) · Zbl 1341.62011
[33] Granville, V; Krivanek, M; Rasson, J-P, Simulated annealing: a proof of convergence, IEEE. Trans. Pattern Anal., 16, 652-656, (1994)
[34] Grefenstette, J., Gopal, R., Rosmaita, B., Van Gucht, D. (1985). Genetic algorithms for the traveling salesman problem. In: Proceedings of the first International Conference on Genetic Algorithms and their Applications · Zbl 0674.90096
[35] Grzegorczyk, M; Husmeier, D, Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move, Mach. Learn., 71, 265-305, (2008)
[36] Guyon, I., Aliferis, C., Cooper, G.: Causation and Prediction Challenge, vol. 2. Challenges in Machine Learning. Microtome Publishing, Brookline (2010)
[37] Hauser, A; Bühlmann, P, Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs, J. Mach. Learn. Res., 13, 2409-2464, (2012) · Zbl 1433.68346
[38] Hauser, A; Bühlmann, P, Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs, J. R. Stat. Soc. Ser. B, 77, 291-318, (2015) · Zbl 1414.62021
[39] Hogben, L. (ed.): Handbook of linear algebra. Discrete Mathematics and its Applications (Boca Raton). Associate editors: Richard Brualdi, Anne Greenbaum and Roy Mathias. Chapman & Hall/CRC, Boca Raton (2007) · Zbl 0850.62538
[40] Holland, J.H.: Adaptation in Natural and Artificial Systems. An introductory analysis with applications to biology, control, and artificial intelligence. University of Michigan Press, Ann Arbor (1975) · Zbl 0317.68006
[41] Horst, R., Pardalos, P.M. (eds.): Handbook of global optimization. Nonconvex optimization and its applications. Kluwer Academic Publishers, Dordrecht (1995)
[42] Hoyle, R. (ed.): Structural Equation Modeling. Sage Publ, Thousand Oaks (1995)
[43] Husmeier, D; Werhli, A, Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks with Bayesian networks, Comput. Syst. Bioinform., 6, 85-95, (2007) · Zbl 1166.62373
[44] Huynh-Thu, VA; Irrthum, A; Wehenkel, L; Geurts, P, Inferring regulatory networks from expression data using tree-based methods, PLoS ONE, 5, e12776, (2010)
[45] Jones, DR; Perttunen, CD; Stuckman, BE, Lipschitzian optimization without the Lipschitz constant, J. Optim. Theory Appl., 79, 157-181, (1993) · Zbl 0796.49032
[46] Kahn, AB, Topological sorting of large networks, Commun. ACM, 5, 558-562, (1962) · Zbl 0106.32602
[47] Kahre, K; Oh, S; Rajaratnam, B, A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees, JRSS B, 77, 803-825, (2015) · Zbl 1414.62183
[48] Kalisch, M; Bühlmann, P, Estimating high-dimensional directed acyclic graphs with the PC-algorithm, J. Mach. Learn. Res., 8, 613-636, (2007) · Zbl 1222.68229
[49] Koivisto, M; Sood, K, Exact Bayesian structure discovery in Bayesian networks, J. Mach. Learn. Res., 5, 549-573, (2004) · Zbl 1222.68234
[50] Ledoit, O; Wolf, M, Spectrum estimation: a unified framework for covariance matrix estimation and pca in large dimensions, J. Multivariate Anal., 139, 360-384, (2015) · Zbl 1328.62340
[51] Liaw, A; Wiener, M, Classification and regression by randomforest, R News, 2, 18-22, (2002)
[52] Liu, H; Wang, L; Zhao, T, Sparse covariance matrix estimation with eigenvalue constraints, J Comput Graph Stat, 23, 439-459, (2014)
[53] Lounici, K., Pontil, M., Tsybakov, A., van de Geer, S.: Taking advantage of sparsity in multi-task learning. In: Proceedings of the 22nd Conference on Learning Theory (2009)
[54] Maathuis, M; Colombo, D; Kalisch, M; Bühlmann, P, Predicting causal effects in large-scale systems from observational data, Nat. Methods, 7, 247-248, (2010)
[55] Marbach, D; Schaffter, T; Mattiussi, C; Floreano, D, Generating realistic in silico gene networks for performance assessment of reverse engineering methods, J. Comput. Biol., 16, 229-239, (2009)
[56] Margolin, A; Nemenman, I; Basso, K; Wiggins, C; Stolovitzky, G; Dalla Favera, R; Califano, A, Aracne: an algorithm for the reconstruction of gene regulatory networks in a Mammalian cellular context, BMC Bioinform., 7, s7, (2006)
[57] Meinshausen, N; Bühlmann, P, High-dimensional graphs and variable selection with the lasso, Ann. Stat., 34, 1436-1462, (2006) · Zbl 1113.62082
[58] Mestre, X, Improved estimation of eigenvalues and eigenvectors of covariance matrices using their sample estimates, IEEE Trans. Inform. Theory, 54, 5113-5129, (2008) · Zbl 1318.62191
[59] Michalewicz, Z.: Genetic Algorithms \(+\) data structures \(=\) Evolution Programs, 2nd edn. Springer, Berlin (1994) · Zbl 0818.68017
[60] Mordelet, F; Vert, J, Sirene: supervised inference of regulatory networks, Bioinformatics, 24, i76-i82, (2008)
[61] Newman, M, The structure and function of complex networks, SIAM Rev., 45, 157-256, (2003) · Zbl 1029.68010
[62] Pearl, J.: Causality, Models, Reasoning, and Inference, 2nd edn. Cambridge University Press, Cambridge (2009) · Zbl 1188.68291
[63] Perrin, B., Ralaivola, L., Mazurie, A., Bottani, S., Mallet, J., d’Alché Buc, F.: Gene networks inference using dynamic bayesian networks. Bioinformatics 19Supp2, ii138-ii148 (2003) · Zbl 1222.68229
[64] Peters, J., Mooij, J. M., Janzing, D., Schölkopf, B.: Identifiability of causal graphs using functional models. In: 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011) (2011)
[65] Peters, J; Mooij, J; Janzing, D; Schölkopf, B, Causal discovery with continuous additive noise models, J. Mach. Learn. Res., 15, 2009-2053, (2014) · Zbl 1318.68151
[66] Piszcz, A., Soule, T.: Genetic programming: Optimal population sizes for varying complexity problem. In: Proceedings of the Genetic and Evolutionary Computation Conference (2006) · Zbl 1437.62415
[67] Core Team, R.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2017)
[68] Rau, A; Jaffrézic, F; Nuel, G, Joint estimation of causal effects from observational and intervention gene expression data, BMC. Syst. Biol., 7, 111, (2013)
[69] Richardson, T, A characterization of Markov equivalence for directed cyclic graphs, Int. J. Approx. Reason., 17, 107-162, (1997) · Zbl 0939.68095
[70] Ridge, E.: Design of Experiments for the Tuning of Optimisation Algorithm. Ph.D. thesis, The University of York, Department of Computer Science (2007) · Zbl 1144.68305
[71] Robinson, R.W.: Counting labeled acyclic digraphs, pp. 239-273 (1973) · Zbl 0259.05116
[72] Sachs, K., Itani, S., Fitzegerald, J., Wille, L., Schoeberl, B., Dahleh, M., Nolan, G.: Learning cyclic signaling pathway structures while minimizing data requirements. In: Proceedings of the Pacific Symposium on Biocomputing (2009) · Zbl 05564531
[73] Schaffer, J.D., Caruana, R., Eshelman, L.J., Das, R.: A study of control parameters affecting online performance of genetic algorithms for function optimization. In: Proceedings of the Third International Conference on Genetic Algorithms (1989) · Zbl 1033.68104
[74] Schwarz, GE, Estimating the dimension of a model, Ann. Stat., 6, 461-464, (1978) · Zbl 0379.62005
[75] Sergeyev, Y, An information global optimization algorithm with local tuning, SIAM J. Optim., 5, 858-870, (1995) · Zbl 0847.90128
[76] Sergeyev, Y; Kvasov, D, Global search based on efficient diagonal partitions and a set of Lipschitz constants, SIAM J. Optim., 16, 910-937, (2006) · Zbl 1097.65068
[77] Shojaie, A; Jauhiainen, A; Kallitsis, M; Michailidis, G, Inferring regulatory networks by combining perturbation screens and steady state gene expression profiles, PLoS ONE, 9, e82393, (2014)
[78] Shojaie, A; Michailidis, G, Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs, Biometrika, 97, 519-538, (2010) · Zbl 1195.62090
[79] Silander, T., Myllymäki, T.: A simple approach for finding the globally optimal bayesian network structure. In: Proceedings of the Twenty-second Conference on Uncertainty in Artificial Intelligence (2006)
[80] Souma, W., Fujiwara, Y., Aoyama, H.: Heterogeneous economic networks. In The complex networks of economic interactions, vol. 567 of Lecture Notes in Economics and Mathematics Systems, pp. 79-92. Springer, Berlin (2006) · Zbl 1183.91139
[81] Spirtes, P.: Directed cyclic graphical representations of feedback models. In: Proceedings of the 11th Annual Conference on Uncertainty in Artificial Intelligence (1995)
[82] Spirtes, P., Glymour, C., Scheines, R.: Causation, prediction, and search. Adaptive Computation and Machine Learning, 2nd edn. With additional material by David Heckerman, Christopher Meek, Gregory F. Cooper and Thomas Richardson, A Bradford Book. Cambridge: MIT Press (2000) · Zbl 0806.62001
[83] Tibshirani, R, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B, 58, 267-288, (1996) · Zbl 0850.62538
[84] Tsamardinos, I; Brown, L; Aliferis, C, The MAX-MIN Hill-climbing Bayesian network structure learning algorithm, Mach. Learn., 65, 31-78, (2006)
[85] Geer, S; Bühlmann, P, \(ℓ _0\)-penalized maximum likelihood for sparse directed acyclic graphs, Ann. Stat., 41, 536-567, (2013) · Zbl 1267.62037
[86] Verma, T., Pearl, J.: Equivalence and synthesis of causal models. In: Proceedings of the 6th Annual Conference on Uncertainty in Artificial Intelligence (1991)
[87] Verma, T., Araújo, N.A.M., Herrmann, H.J.: Revealing the structure of the world airline network. Sci. Rep. 4, 5638 (2014) · Zbl 1029.68010
[88] Verzelen, N, Minimax risks for sparse regressions: ultra-high dimensional phenomenons, Electron. J. Stat., 6, 38-90, (2012) · Zbl 1334.62120
[89] Wainwright, M, Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting, IEEE Trans. Inform. Theory, 55, 5728-5741, (2009) · Zbl 1367.94106
[90] Witten, D; Freidman, J; Simon, N, New insights and faster computations for the graphical lasso, J. Comput. Graph. Stat., 20, 892-900, (2011)
[91] Wright, S, Correlation and causation, J. Agric. Res., 20, 558-585, (1921)
[92] Yuan, M; Lin, Y, Model selection and estimation in the Gaussian graphical model, Biometrika, 94, 19-35, (2007) · Zbl 1142.62408
[93] Zhou, Q, Multi-domain sampling with applications to structural inference of Bayesian networks, J. Am. Stat. Assoc., 106, 1317-1330, (2011) · Zbl 1233.62050
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.