×

Sparse nonparametric graphical models. (English) Zbl 1331.62219

Summary: We present some nonparametric methods for graphical modeling. In the discrete case, where the data are binary or drawn from a finite alphabet, Markov random fields are already essentially nonparametric, since the cliques can take only a finite number of values. Continuous data are different. The Gaussian graphical model is the standard parametric model for continuous data, but it makes distributional assumptions that are often unrealistic. We discuss two approaches to building more flexible graphical models. One allows arbitrary graphs and a nonparametric extension of the Gaussian; the other uses kernel density estimation and restricts the graphs to trees and forests. Examples of both methods are presented. We also discuss possible future research directions for nonparametric graphical modeling.

MSC:

62G07 Density estimation
62A09 Graphical methods in statistics
62H05 Characterization and structure theory for multivariate probability distributions; copulas

Software:

glasso
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485-516. · Zbl 1225.68149
[2] Chandrasekaran, V., Parrilo, P. A. and Willsky, A. S. (2010). Latent variable graphical model selection via convex optimization. Available at . · Zbl 1288.62085
[3] Chen, X., Liu, Y., Liu, H. and Carbonell, J. G. (2010). Learning spatial-temporal varying graphs with applications to climate data analysis. In AAAI- 10: Twenty-Fourth Conference on Artificial Intelligence ( AAAI ) AAAI Press, Menlo Park, CA.
[4] Choi, M. J., Tan, V. Y., Anandkumar, A. and Willsky, A. S. (2010). Learning latent tree graphical models. Available at . · Zbl 1280.68160
[5] Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory 14 462-467. · Zbl 0165.22305
[6] Friedman, J. H., Hastie, T. and Tibshirani, R. (2007). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432-441. · Zbl 1143.62076
[7] Giné, E. and Guillou, A. (2002). Rates of strong uniform consistency for multivariate kernel density estimators. Ann. Inst. H. Poincaré Probab. Stat. 38 907-921. · Zbl 1011.62034
[8] Jeon, Y. and Lin, Y. (2006). An effective method for high-dimensional log-density ANOVA estimation, with application to nonparametric graphical model building. Statist. Sinica 16 353-374. · Zbl 1096.62075
[9] Kolar, M., Song, L., Ahmed, A. and Xing, E. P. (2010). Estimating time-varying networks. Ann. Appl. Stat. 4 94-123. · Zbl 1189.62142
[10] Kruskal, J. B. Jr. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Amer. Math. Soc. 7 48-50. · Zbl 0070.18404
[11] Lafferty, J. and Wasserman, L. (2008). Rodeo: Sparse, greedy nonparametric regression. Ann. Statist. 36 28-63. · Zbl 1132.62026
[12] Liu, H., Lafferty, J. and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10 2295-2328. · Zbl 1235.62035
[13] Liu, H., Chen, X., Lafferty, J. and Wasserman, L. (2010). Graph-valued regression. In Proceedings of the Twenty-Third Annual Conference on Neural Information Processing Systems ( NIPS ).
[14] Liu, H., Xu, M., Gu, H., Gupta, A., Lafferty, J. and Wasserman, L. (2011). Forest density estimation. J. Mach. Learn. Res. 12 907-951. · Zbl 1280.62045
[15] Liu, H., Han, F., Yuan, M., Lafferty, J. and Wasserman, L. (2012). High dimensional semiparametric Gaussian copula graphical models. In Proceedings of the 29 th International Conference on Machine Learning ( ICML- 12). ACM, New York. · Zbl 1297.62073
[16] Mallows, C. L., ed. (1990). The Collected Works of John W. Tukey. Vol. VI : More Mathematical : 1938 - 1984. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA.
[17] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082
[18] Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using \(\ell _{1}\)-regularized logistic regression. Ann. Statist. 38 1287-1319. · Zbl 1189.62115
[19] Ravikumar, P., Wainwright, M., Raskutti, G. and Yu, B. (2009). Model selection in Gaussian graphical models: High-dimensional consistency of \(\ell _{1}\)-regularized MLE. In Advances in Neural Information Processing Systems , 22 MIT Press, Cambridge, MA.
[20] Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494-515. · Zbl 1320.62135
[21] Schwaighofer, A., Dejori, M., Tresp, V. and Stetter, M. (2007). Structure learning with nonparametric decomposable models. In Proceedings of the 17 th International Conference on Artificial Neural Networks. ICANN’ 07. Elsevier, New York.
[22] Sklar, M. (1959). Fonctions de répartition à \(n\) dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 8 229-231. · Zbl 0100.14202
[23] Wasserman, L. (2006). All of Nonparametric Statistics . Springer, New York. · Zbl 1099.62029
[24] Wille, A., Zimmermann, P., Vranová, E., Fürholz, A., Laule, O., Bleuler, S., Hennig, L., Prelić, A., von Rohr, P., Thiele, L., Zitzler, E., Gruissem, W. and Bühlmann, P. (2004). Sparse Gaussian graphical modelling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biology 5 R92.
[25] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19-35. · Zbl 1142.62408
[26] Zhou, S., Lafferty, J. and Wasserman, L. (2010). Time varying undirected graphs. Mach. Learn. 80 295-329.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.