×

An efficient parallel block coordinate descent algorithm for large-scale precision matrix estimation using graphics processing units. (English) Zbl 1505.62104

Summary: Large-scale sparse precision matrix estimation has attracted wide interest from the statistics community. The convex partial correlation selection method (CONCORD) developed by K. Khare et al. [J. R. Stat. Soc., Ser. B, Stat. Methodol. 77, No. 4, 803–825 (2015; Zbl 1414.62183)] has recently been credited with some theoretical properties for estimating sparse precision matrices. The CONCORD obtains its solution by a coordinate descent algorithm (CONCORD-CD) based on the convexity of the objective function. However, since a coordinate-wise update in CONCORD-CD is inherently serial, a scale-up is nontrivial. In this paper, we propose a novel parallelization of CONCORD-CD, namely, CONCORD-PCD. CONCORD-PCD partitions the off-diagonal elements into several groups and updates each group simultaneously without harming the computational convergence of CONCORD-CD. We guarantee this by employing the notion of edge coloring in graph theory. Specifically, we establish a nontrivial correspondence between scheduling the updates of the off-diagonal elements in CONCORD-CD and coloring the edges of a complete graph. It turns out that CONCORD-PCD simultanoeusly updates off-diagonal elements in which the associated edges are colorable with the same color. As a result, the number of steps required for updating off-diagonal elements reduces from \(p(p-1)/2\) to \(p-1\) (for even \(p)\) or \(p\) (for odd \(p)\), where \(p\) denotes the number of variables. We prove that the number of such steps is irreducible In addition, CONCORD-PCD is tailored to single-instruction multiple-data (SIMD) parallelism. A numerical study shows that the SIMD-parallelized PCD algorithm implemented in graphics processing units boosts the CONCORD-CD algorithm multiple times. The method is available in the R package pcdconcord.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H12 Estimation in multivariate analysis
62H22 Probabilistic graphical models

Citations:

Zbl 1414.62183
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Barabási, A-L; Albert, R., Emergence of scaling in random networks, Science, 286, 5439, 509-512 (1999) · Zbl 1226.05223 · doi:10.1126/science.286.5439.509
[2] Bradley JK, Kyrola A, Bickson D, Guestrin C (1998) Parallel coordinate descent for L1-regularized loss minimization. In: Proceedings of the 28th international conference on machine learning, ICML 2011, pp 321-328
[3] Cai, T.; Liu, W.; Luo, X., A constrained l1 minimization approach to sparse precision matrix estimation, J Am Stat Assoc, 106, 494, 594-607 (2011) · Zbl 1232.62087 · doi:10.1198/jasa.2011.tm10155
[4] Cai, TT; Liu, W.; Zhou, HH, Estimating sparse precision matrix: optimal rates of convergence and adaptive estimation, Ann Stat, 44, 2, 455-488 (2016) · Zbl 1341.62115
[5] Danaher, P.; Wang, P.; Witten, DM, The joint graphical lasso for inverse covariance estimation across multiple classes, J R Stat Soc Ser B Stat Methodol, 76, 2, 373-397 (2014) · Zbl 07555455 · doi:10.1111/rssb.12033
[6] Dinitz JH, Froncek D, Lamken,ER, Wallis WD (2006) Scheduling a tournament. In: Handbook of combinatorial designs, chapter VI.51, 2nd edn. Chapman & Hall/CRC, pp 591-606
[7] Formanowicz, P.; Tanaś, K., A survey of graph coloring—its types, methods and applications, Found Comput Decis Sci, 37, 3, 223-238 (2012) · Zbl 1360.05057 · doi:10.2478/v10209-011-0012-y
[8] Friedman, J.; Hastie, T.; Tibshirani, R., Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9, 3, 432-441 (2008) · Zbl 1143.62076 · doi:10.1093/biostatistics/kxm045
[9] Hsieh, C-J, QUIC?: quadratic approximation for sparse inverse covariance estimation, J Mach Learn Res, 15, 2911-2947 (2014) · Zbl 1319.65048
[10] Hsieh, C-J; Sustik, MA; Dhillon, IS; Ravikumar, PK; Poldrack, R.; Burges, CJC; Bottou, L.; Welling, M.; Ghahramani, Z.; Weinberger, KQ, BIG & QUIC: sparse inverse covariance estimation for a million variables, Advances in neural information processing systems, 3165-3173 (2013), Red Hook: Curran Associates Inc, Red Hook
[11] Khare, K.; Oh, S-Y; Rajaratnam, B., A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees, J R Stat Soc Ser B (Stat Methodol), 77, 4, 803-825 (2015) · Zbl 1414.62183 · doi:10.1111/rssb.12088
[12] Lawson, C.; Hanson, R.; Kincaid, D.; Krogh, F., Algorithm 539: basic linear algebra subprograms for Fortran usage, ACM Trans Math Softw, 5, 3, 308-323 (1979) · Zbl 0412.65022 · doi:10.1145/355841.355847
[13] Mazumder, R.; Hastie, T., The graphical Lasso: new insights and alternatives, Electron J Stat, 6, August, 2125-2149 (2012) · Zbl 1295.62066
[14] Meinshausen, N.; Bühlmann, P., High-dimensional graphs and variable selection with the Lasso, Ann Stat, 34, 3, 1436-1462 (2006) · Zbl 1113.62082 · doi:10.1214/009053606000000281
[15] Nakano S-I, Zhou X, Nishizeki T (1995) Edge-coloring algorithms. In: Computer science today. Lecture notes in computer science. Springer, Berlin, vol 1000, pp 172-183 · Zbl 0875.00060
[16] Newman, MEJ, The structure and function of complex networks, SIAM Rev, 45, 2, 167-256 (2003) · Zbl 1029.68010 · doi:10.1137/S003614450342480
[17] Pang, H.; Liu, H.; Vanderbei, R., The FASTCLIME package for linear programming and large-scale precision matrix estimation in R, J Mach Learn Res, 15, 489-493 (2014) · Zbl 1318.90002
[18] Peng, J.; Wang, P.; Zhou, N.; Zhu, J., Partial correlation estimation by joint sparse regression models, J Am Stat Assoc, 104, 486, 735-746 (2009) · Zbl 1388.62046 · doi:10.1198/jasa.2009.0126
[19] Richtárik P, Takáč M (2016) Parallel coordinate descent methods for big data optimization, vol 156 · Zbl 1342.90102
[20] Sun, T.; Zhang, CH, Sparse matrix inversion with scaled Lasso, J Mach Learn Res, 14, 3385-3418 (2013) · Zbl 1318.62184
[21] Tseng, P., Convergence of a block coordinate descent method for nondifferentiable minimization, J Optim Theory Appl, 109, 3, 475-494 (2001) · Zbl 1006.65062 · doi:10.1023/A:1017501703105
[22] Wang, H.; Banerjee, A.; Hsieh, C-J; Ravikumar, PK; Dhillon, IS; Burges, CJC; Bottou, L.; Welling, M.; Ghahramani, Z.; Weinberger, KQ, Large scale distributed sparse precision estimation, Advances in neural information processing systems, 584-592 (2013), Red Hook: Curran Associates Inc, Red Hook
[23] Witten, DM; Friedman, JH; Simon, N., New insights and faster computations for the graphical lasso, J Comput Graph Stat, 20, 4, 892-900 (2011) · doi:10.1198/jcgs.2011.11051a
[24] Yu, D.; Lee, SH; Lim, J.; Xiao, G.; Craddock, RC; Biswal, BB, Fused lasso regression for identifying differential correlations in brain connectome graphs, Stat Anal Data Min ASA Data Sci J, 11, 5, 203-226 (2018) · Zbl 07260740 · doi:10.1002/sam.11382
[25] Yuan, M.; Lin, Y., Model selection and estimation in the Gaussian graphical model, Biometrika, 94, 19-35 (2007) · Zbl 1142.62408 · doi:10.1093/biomet/asm018
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.