zbMATH — the first resource for mathematics

A unified framework for structured graph learning via spectral constraints. (English) Zbl 07255053
Summary: Graph learning from data is a canonical problem that has received substantial attention in the literature. Learning a structured graph is essential for interpretability and identification of the relationships among data. In general, learning a graph with a specific structure is an NP-hard combinatorial problem and thus designing a general tractable algorithm is challenging. Some useful structured graphs include connected, sparse, multi-component, bipartite, and regular graphs. In this paper, we introduce a unified framework for structured graph learning that combines Gaussian graphical model and spectral graph theory. We propose to convert combinatorial structural constraints into spectral constraints on graph matrices and develop an optimization framework based on block majorization-minimization to solve structured graph learning problem. The proposed algorithms are provably convergent and practically amenable for a number of graph based applications such as data clustering. Extensive numerical experiments with both synthetic and real data sets illustrate the effectiveness of the proposed algorithms. An open source R package containing the code for all the experiments is available at https://CRAN.R-project.org/package=spectralGraphTopology.
68T05 Learning and adaptive systems in artificial intelligence
Full Text: Link
[1] P.-A. Absil, R. Mahony, and R. Sepulchre.Optimization algorithms on matrix manifolds. Princeton University Press, 2009. · Zbl 1147.65043
[2] C. Ambroise, J. Chiquet, C. Matias, et al. Inferring sparse gaussian graphical models with latent structure.Electronic Journal of Statistics, 3:205-238, 2009. · Zbl 1326.62011
[3] A. Anandkumar, V. Y. Tan, F. Huang, and A. S. Willsky. High-dimensional gaussian graphical model selection: Walk summability and local separation criterion.Journal of Machine Learning Research, 13(Aug):2293-2337, 2012. · Zbl 1433.68322
[4] O. Banerjee, L. E. Ghaoui, and A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data.Journal of Machine Learning Research, 9(Mar):485-516, 2008. · Zbl 1225.68149
[5] A.-L. Barab´asi et al.Network Sciene. Cambridge University Press, 2016.
[6] R. E. Barlow and H. D. Brunk. The isotonic regression problem and its dual.Journal of the American Statistical Association, 67(337):140-147, 1972. · Zbl 0236.62050
[7] D. J. Bartholomew. Isotonic inference.Encyclopedia of Statistical Scienes, 6, 2004.
[8] S. Basu, A. Shojaie, and G. Michailidis. Network granger causality with inherent grouping structure.Journal of Machine Learning Research, 16(1):417-453, 2015. · Zbl 1360.62312
[9] K. Benidis, Y. Sun, P. Babu, and D. P. Palomar. Orthogonal sparse pca and covariance estimation via procrustes reformulation.IEEE Transactions on Signal Processing, 64(23): 6211-6226, 2016. · Zbl 1414.94069
[10] A. Berman and M. Farber. A lower bound for the second largest laplacian eigenvalue of weighted graphs.Electronic Journal of Linear Algebra, 22(1):79, 2011. · Zbl 1252.05111
[11] M. J. Best and N. Chakravarti. Active set algorithms for isotonic regression; a unifying framework.Mathematical Programming, 47(1-3):425-439, 1990. · Zbl 0715.90085
[12] A. Bogdanov, E. Mossel, and S. Vadhan. The complexity of distinguishing markov random fields. InApproximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pages 331-342. Springer, 2008. · Zbl 1159.68042
[13] T. T. Cai, H. Li, W. Liu, and J. Xie. Joint estimation of multiple high-dimensional precision matrices.Statistica Sinica, 26(2):445, 2016. · Zbl 1356.62066
[14] E. J. Cand‘es, M. B. Wakin, and S. P. Boyd. Enhancing sparsity by reweighted‘1minimization.Journal of Fourier Analysis and Applications, 14(5):877-905, 2008. · Zbl 1176.94014
[15] V. Chandrasekaran, P. A. Parrilo, and A. S. Willsky. Latent variable graphical model selection via convex optimization. In2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1610-1613. IEEE, 2010.
[16] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees.IEEE Transactions on Information Theory, 14(3):462-467, 1968. · Zbl 0165.22305
[17] Y.-T. Chow, W. Shi, T. Wu, and W. Yin. Expander graph and communication-efficient decentralized optimization. InSignals, Systems and Computers, 2016 50th Asilomar Conference on, pages 1715-1720. IEEE, 2016.
[18] F. R. Chung.Spectral graph theory. Number 92. American Mathematical Soc., 1997.
[19] M. Coutino, E. Isufi, T. Maehara, and G. Leus. State-space network topology identification from partial observations.arXiv preprint arXiv:1906.10471, 2019.
[20] P. Danaher, P. Wang, and D. M. Witten. The joint graphical lasso for inverse covariance estimation across multiple classes.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(2):373-397, 2014.
[21] K. C. Das and R. Bapat. A sharp upper bound on the largest laplacian eigenvalue of weighted graphs.Linear Algebra and its Applications, 409:153-165, 2005. · Zbl 1072.05039
[22] M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. InAdvances in Neural Information Processing Systems, pages 3844-3852, 2016.
[23] A. P. Dempster. Covariance selection.Biometrics, pages 157-175, 1972.
[24] D. den Hertog, C. Roos, and T. Terlaky. The linear complimentarity problem, sufficient matrices, and the criss-cross method.Linear Algebra and its Applications, 187:1-14, 1993. · Zbl 0778.65044
[25] D. Dheeru and E. Karra Taniskidou. UCI machine learning repository, 2017. URLhttps: //archive.ics.uci.edu/ml/datasets/gene+expression+cancer+RNA-Seq#.
[26] I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. InProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 269-274. ACM, 2001.
[27] I. S. Dhillon and J. A. Tropp. Matrix nearness problems with bregman divergences.SIAM Journal on Matrix Analysis and Applications, 29(4):1120-1146, 2007. · Zbl 1153.65044
[28] M. Drton and T. S. Richardson. Graphical methods for efficient likelihood inference in gaussian covariance models.Journal of Machine Learning Research, 9(May):893-914, 2008. · Zbl 1225.62031
[29] J. Duchi, S. Gould, and D. Koller. Projected subgradient methods for learning sparse gaussians. InProceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, page 153-160, 2008.
[30] H. E. Egilmez, E. Pavez, and A. Ortega. Graph learning from data under laplacian and structural constrints.IEEE Journal of Selected Topics in Signal Processing, 11(6):825- 841, 2017.
[31] M. Farber and I. Kaminer. Upper bound for the laplacian eigenvalues of a graph.arXiv preprint arXiv:1106.0769, 2011.
[32] S. Fattahi and S. Sojoudi. Graphical lasso and thresholding: Equivalence and closed-form solutions.Journal of Machine Learning Research, 20(10):1-44, 2019. · Zbl 07049729
[33] M. A. T. Figueiredo and A. K. Jain. Unsupervised learning of finite mixture models.IEEE Transactions on Pattern Analysis and Machine Intelligence, (3):381-396, 2002.
[34] M. Fiori, P. Mus´e, and G. Sapiro. Topology constraints in graphical models. InAdvances in Neural Information Processing Systems, pages 791-799, 2012.
[35] C. Fraley and A. E. Raftery. Bayesian regularization for normal mixture estimation and model-based clustering.Journal of Classification, 24(2):155-181, 2007. · Zbl 1159.62302
[36] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso.Biostatistics, 9(3):432-441, 2008. · Zbl 1143.62076
[37] A. Fu, B. Narasimhan, and S. Boyd. CVXR: An R package for disciplined convex optimization. Technical report, Stanford University, 2017a. URLhttps://web.stanford.edu/  boyd/papers/cvxr_paper.html.
[38] X. Fu, K. Huang, M. Hong, N. D. Sidiropoulos, and A. M.-C. So. Scalable and flexible multiview max-var canonical correlation analysis.IEEE Transactions on Signal Processing, 65(16):4150-4165, 2017b. · Zbl 1414.94204
[39] M. Gavish, B. Nadler, and R. R. Coifman. Multiscale wavelets on trees, graphs and high dimensional data: Theory and applications to semi supervised learning. InInternational Conference on Machine Learning, pages 367-374, 2010.
[40] C. D. Godsil and B. McKay. Constructing cospectral graphs.Aequationes Mathematicae, 25(1):257-268, 1982. · Zbl 0527.05051
[41] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming.Journal of the ACM (JACM), 42(6):1115-1145, 1995. · Zbl 0885.68088
[42] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.1, 2014.
[43] J. Guo, E. Levina, G. Michailidis, and J. Zhu. Joint estimation of multiple graphical models. Biometrika, 98(1):1-15, 2011. · Zbl 1214.62058
[44] B. Hao, W. W. Sun, Y. Liu, and G. Cheng. Simultaneous clustering and estimation of heterogeneous graphical models.Journal of Machine Learning Research, 18(217):1-58, 2018. · Zbl 06982973
[45] O. Hein¨avaara, J. Lepp¨a-Aho, J. Corander, and A. Honkela. On the inconsistency of‘1penalised sparse precision matrix estimation.BMC Bioinformatics, 17(16):448, 2016.
[46] B. Huang and T. Jebara. Maximum likelihood graph structure estimation with degree distributions. InAnalyzing Graphs: Theory and Applications, NIPS Workshop, volume 14, 2008.
[47] P. Jeziorski and I. Segal. What makes them click: Empirical analysis of consumer demand for search advertising.American Economic Journal: Microeconomics, 7(3):24-53, 2015.
[48] V. Kalofolias. How to learn a graph from smooth signals. InArtificial Intelligence and Statistics, pages 920-929, 2016.
[49] A. Karatzoglou, A. Smola, K. Hornik, and A. Zeileis. kernlab-an s4 package for kernel methods in r.Journal of Statistical Software, 11(9):1-20, 2004.
[50] J. K. Kim and S. Choi. Clustering with r-regular graphs.Pattern Recognition, 42(9): 2020-2028, 2009. · Zbl 1191.68583
[51] I. Koutis, G. L. Miller, and D. Tolliver. Combinatorial preconditioners and multilevel solvers for problems in computer vision and image processing.Computer Vision and Image Understanding, 115(12):1638-1646, 2011.
[52] S. Kumar, J. Ying, J. V. d. M. Cardoso, and D. P. Palomar. Structured graph learning via Laplacian spectral constraints. InAdvances in Neural Information Processing Systems, 2019. URLarXivpreprintarXiv:1909.11594.
[53] B. Lake and J. Tenenbaum. Discovering structure by learning sparse graphs. InProceedings of the 33rd Annual Cognitive Science Conference., 2010.
[54] C. Lam and J. Fan. Sparsistency and rates of convergence in large covariance matrix estimation.The Annals of statistics, 37(6B):4254, 2009. · Zbl 1191.62101
[55] S. L. Lauritzen.Graphical models. Oxford Statistical Science Series 17. Oxford University Press, 1996. · Zbl 0907.62001
[56] Y. LeCun. The mnist database of handwritten digits. 1998. URLhttp://yann.lecun. com/exdb/mnist/.
[57] C.-I. C. Lee. The quadratic loss of isotonic regression under normality.The Annals of Statistics, 9(3):686-688, 1981. · Zbl 0477.62015
[58] W. Lee and Y. Liu. Joint estimation of multiple precision matrices with common structures. Journal of Machine Learning Research, 16(1):1035-1062, 2015. · Zbl 1360.62274
[59] H. Lin, R. Liu, and J. Shu. Some results on the largest and least eigenvalues of graphs. Electronic Journal of Linear Algebra, 27(1):259, 2014. · Zbl 1320.05077
[60] Q. Liu and A. Ihler. Learning scale free networks by reweighted l1 regularization. InProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 40-48, 2011.
[61] O. E. Livne and A. Brandt. Lean algebraic multigrid (lamg): Fast graph laplacian linear solver.SIAM Journal on Scientific Computing, 34(4):B499-B522, 2012. · Zbl 1253.65045
[62] A. Loukas and P. Vandergheynst. Spectrally approximating large graphs with smaller graphs. InProceedings of the 35th International Conference on Machine Learning, volume 80, pages 3237-3246, 2018.
[63] L. Lov´asz. Eigenvalues of graphs. Technical report, Eotvos Lorand University, 2007.
[64] R. Luss and S. Rosset. Generalized isotonic regression.Journal of Computational and Graphical Statistics, 23(1):192-210, 2014.
[65] H. P. Maretic, D. Thanou, and P. Frossard. Graph learning under sparsity priors. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pages 6523-6527. IEEE, 2017.
[66] B. M. Marlin and K. P. Murphy. Sparse gaussian graphical models with unknown block structure. InProceedings of the 26th Annual International Conference on Machine Learning, pages 705-712. ACM, 2009.
[67] R. Mazumder and T. Hastie. The graphical lasso: New insights and alternatives.Electronic Journal of Statistics, 6:2125, 2012. · Zbl 1295.62066
[68] N. Meinshausen and P. B¨uhlmann. High-dimensional graphs and variable selection with the lasso.The Annals of Statistics, 34(3):1436-1462, 2006. · Zbl 1113.62082
[69] Z. Meng, B. Eriksson, and A. Hero. Learning latent variable gaussian graphical models. In International Conference on Machine Learning, pages 1269-1277, 2014.
[70] K. Mohan, P. London, M. Fazel, D. Witten, and S.-I. Lee. Node-based learning of multiple gaussian graphical models.Journal of Machine Learning Research, 15(1):445-488, 2014. · Zbl 1318.62181
[71] B. Mohar. Some applications of Laplace eigenvalues of graphs. InGraph Symmetry: Algebraic Methods and Applications., pages 225-275. Springer, 1997. · Zbl 0883.05096
[72] S. K. Narang and A. Ortega. Perfect reconstruction two-channel wavelet filter banks for graph structured data.IEEE Transactions on Signal Processing, 60(6):2786-2799, 2012. · Zbl 1393.94373
[73] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. InAdvances in Neural Information Processing Systems, pages 849-856, 2002.
[74] F. Nie, X. Wang, M. I. Jordan, and H. Huang. The constrained laplacian rank algorithm for graph-based clustering. InThirtieth AAAI Conference on Artificial Intelligence, 2016.
[75] F. Nie, X. Wang, C. Deng, and H. Huang. Learning a structured optimal bipartite graph for co-clustering. InAdvances in Neural Information Processing Systems, pages 4132-4141, 2017.
[76] M. Nikolova and M. K. Ng. Analysis of half-quadratic minimization methods for signal and image recovery.SIAM Journal on Scientific Computing, 27(3):937-966, 2005. · Zbl 1141.49318
[77] A. Ortega, P. Frossard, J. Kovavcevi´c, J. M. Moura, and P. Vandergheynst. Graph signal processing: Overview, challenges, and applications.Proceedings of the IEEE, 106(5): 808-828, 2018.
[78] D. N. Osherson, J. Stern, O. Wilkie, M. Stob, and E. E. Smith. Default probability. Cognitive Science, 15(2):251-269, 1991.
[79] E. Pavez and A. Ortega. Generalized laplacian precision matrix estimation for graph signal processing. InAcoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, pages 6350-6354. IEEE, 2016.
[80] E. Pavez, H. E. Egilmez, and A. Ortega. Learning graphs with monotone topology properties and multiple connected components.IEEE Transactions on Signal Processing, 66(9): 2399-2413, 2018. · Zbl 1415.68174
[81] G. A. Pavlopoulos, P. I. Kontou, A. Pavlopoulou, C. Bouyioukos, E. Markou, and P. G. Bagos. Bipartite graphs in systems biology and medicine: a survey of methods and applications.GigaScience, 7(4):giy014, 2018.
[82] A. Prabhu, G. Varma, and A. Namboodiri. Deep expander networks: Efficient deep networks from graph theory. InProceedings of the European Conference on Computer Vision (ECCV), pages 20-35, 2018.
[83] K. Rajawat and S. Kumar. Stochastic multidimensional scaling.IEEE Transactions on Signal and Information Processing over Networks, 3(2):360-375, 2017.
[84] P. Ravikumar, M. J. Wainwright, and J. D. Lafferty. High-dimensional ising model selection using‘1-regularized logistic regression.The Annals of Statistics, 38(3):1287-1319, 2010. · Zbl 1189.62115
[85] M. Razaviyayn, M. Hong, and Z.-Q. Luo. A unified convergence analysis of block successive minimization methods for nonsmooth optimization.SIAM Journal on Optimization, 23 (2):1126-1153, 2013. · Zbl 1273.90123
[86] H. Rue and L. Held.Gaussian Markov random fields: theory and applications. CRC press, 2005. · Zbl 1093.60003
[87] S. Sardellitti, S. Barbarossa, and P. Di Lorenzo. Graph topology inference based on sparsifying transform learning.IEEE Transactions on Signal Processing, 67(7):1712-1727, 2019. · Zbl 07045391
[88] S. E. Schaeffer. Graph clustering.Computer Science Review, 1(1):27-64, 2007. · Zbl 1302.68237
[89] U. Schulte. Constructing trees in bipartite graphs.Discrete Mathematics, 154(1-3):317-320, 1996. · Zbl 0855.05044
[90] S. Segarra, A. G. Marques, G. Mateos, and A. Ribeiro. Network topology inference from spectral templates.IEEE Transactions on Signal and Information Processing over Networks, 3(3):467-483, 2017.
[91] G. Shen and A. Ortega. Transform-based distributed data gathering.IEEE Transactions on Signal Processing, 58(7):3802-3815, 2010. · Zbl 1392.94447
[92] X. Shen, W. Pan, and Y. Zhu. Likelihood-based selection and sharp parameter estimation. Journal of the American Statistical Association, 107(497):223-232, 2012. · Zbl 1261.62020
[93] A. Shojaie and G. Michailidis. Penalized likelihood methods for estimation of sparse highdimensional directed acyclic graphs.Biometrika, 97(3):519-538, 2010a. · Zbl 1195.62090
[94] A. Shojaie and G. Michailidis. Penalized principal component regression on graphs for analysis of subnetworks. InAdvances in Neural Information Processing Systems, pages 2155-2163, 2010b.
[95] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains.IEEE Signal Processing Magazine, 30(3):83-98, 2013.
[96] M. Slawski and M. Hein. Estimation of positive definite m-matrices and structure learning for attractive gaussian markov random fields.Linear Algebra and its Applications, 473: 145-179, 2015. · Zbl 1312.62070
[97] A. J. Smola and R. Kondor. Kernels and regularization on graphs. InLearning Theory and Kernel Machines, pages 144-158. Springer, 2003. · Zbl 1274.68351
[98] J. Song, P. Babu, and D. P. Palomar. Sparse generalized eigenvalue problem via smooth optimization.IEEE Transactions on Signal Processing, 63(7):1627-1642, 2015. · Zbl 1394.94551
[99] D. A. Spielman and S.-H. Teng. Spectral sparsification of graphs.SIAM Journal on Computing, 40(4):981-1025, 2011. · Zbl 1228.68040
[100] S. Sun, H. Wang, and J. Xu. Inferring block structure of graphical models in exponential families. InArtificial Intelligence and Statistics, pages 939-947, 2015.
[101] Y. Sun, P. Babu, and D. P. Palomar. Majorization-minimization algorithms in signal processing, communications, and machine learning.IEEE Transactions on Signal Processing, 65(3):794-816, Feb. 2016. · Zbl 1414.94595
[102] M. Sundin, A. Venkitaraman, M. Jansson, and S. Chatterjee. A connectedness constraint for learning sparse graphs. InSignal Processing Conference (EUSIPCO), 2017 25th European, pages 151-155. IEEE, 2017.
[103] K. M. Tan, D. Witten, and A. Shojaie. The cluster graphical lasso for improved estimation of gaussian graphical models.Computational Statistics and Data Analysis, 85:23-36, 2015. · Zbl 06984152
[104] D. A. Tarzanagh and G. Michailidis. Estimation of graphical models through structured norm minimization.Journal of Machine Learning Research, 18(1):7692-7739, 2017.
[105] O. Teke and P. Vaidyanathan. Uncertainty principles and sparse eigenvectors of graphs. IEEE Transactions on Signal Processing, 65(20):5406-5420, 2017. · Zbl 1415.94252
[106] P. Van Mieghem.Graph spectra for complex networks. Cambridge University Press, 2010. · Zbl 1256.05143
[107] B. Wang, A. Pourshafeie, M. Zitnik, J. Zhu, C. D. Bustamante, S. Batzoglou, and J. Leskovec. Network enhancement as a general method to denoise weighted biological networks.Nature Communications, 9(1):3108, 2018.
[108] J. Wang. Joint estimation of sparse multivariate regression and conditional graphical models.Statistica Sinica, pages 831-851, 2015. · Zbl 1415.62051
[109] J. N. Weinstein, E. A. Collisson, G. B. Mills, K. R. M. Shaw, B. A. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander, J. M. Stuart, C. G. A. R. Network, et al. The cancer genome atlas pan-cancer analysis project.Nature Genetics, 45(10):1113, 2013.
[110] H. Yang, G. Hu, and Y. Hong. Bounds of spectral radii of weighted trees.Tsinghua Science and Technology, 8(5):517-520, 2003. · Zbl 1140.05313
[111] J. Ying, J.-F. Cai, D. Guo, G. Tang, Z. Chen, and X. Qu. Vandermonde factorization of hankel matrix for complex exponential signal recovery — application in fastnmr spectroscopy.IEEE Transactions on Signal Processing, 66(21):5520-5533, 2018. · Zbl 1415.94293
[112] A. Yu and M. Lu. Lower bounds on the (laplacian) spectral radius of weighted graphs. Chinese Annals of Mathematics, Series B, 35(4):669-678, 2014. · Zbl 1297.05152
[113] M. Yuan and Y. Lin. Model selection and estimation in the gaussian graphical model. Biometrika, 94(1):19-35, 2007. · Zbl 1142.62408
[114] H. Zha, X. He, C. Ding, H. Simon, and M. Gu. Bipartite graph partitioning and data clustering. InProceedings of the 10th International Conference on Information and knowledge Management, pages 25-32. ACM, 2001.
[115] L. Zhao, Y. Wang, S. Kumar, and D. P. Palomar. Optimization algorithms for graph laplacian estimation via admm and mm.IEEE Transactions on Signal Processing, 67 (16):4231-4244, 2019. · Zbl 07123357
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.