×

zbMATH — the first resource for mathematics

Certifiably optimal sparse inverse covariance estimation. (English) Zbl 07263701
Summary: We consider the maximum likelihood estimation of sparse inverse covariance matrices. We demonstrate that current heuristic approaches primarily encourage robustness, instead of the desired sparsity. We give a novel approach that solves the cardinality constrained likelihood problem to certifiable optimality. The approach uses techniques from mixed-integer optimization and convex optimization, and provides a high-quality solution with a guarantee on its suboptimality, even if the algorithm is terminated early. Using a variety of synthetic and real datasets, we demonstrate that our approach can solve problems where the dimension of the inverse covariance matrix is up to 1000 s. We also demonstrate that our approach produces significantly sparser solutions than Glasso and other popular learning procedures, makes less false discoveries, while still maintaining state-of-the-art accuracy.
MSC:
90C11 Mixed integer programming
90C22 Semidefinite programming
62H12 Estimation in multivariate analysis
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Atamtürk, A.; Narayanan, V., Conic mixed-integer rounding cuts, Math. Program., 122, 1, 1-20 (2010) · Zbl 1184.90112
[2] Atchadé, Y.F., Mazumder, R., Chen, J.: Scalable computation of regularized precision matrices via stochastic optimization (2015). arXiv:1509.00426
[3] Banerjee, O.; El Ghaoui, L.; d’Aspremont, A., Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data, J. Mach. Learn. Res., 9, Mar, 485-516 (2008) · Zbl 1225.68149
[4] Ben-Tal, A.; El Ghaoui, L.; Nemirovski, A., Robust Optimization (2009), Princeton: Princeton University Press, Princeton
[5] Bertsimas, D.; Copenhaver, MS, Characterization of the equivalence of robustification and regularization in linear and matrix regression, Eur. J. Oper. Res., 270, 93142 (2018)
[6] Bertsimas, D.; Mazumder, R., Least quantile regression via modern optimization, Ann. Stat., 42, 2494-2525 (2014) · Zbl 1302.62154
[7] Bertsimas, D., Van Parys, B.: Sparse high-dimensional regression: Exact scalable algorithms and phase transitions (2017). arXiv:1709.10029 · Zbl 1444.62094
[8] Bertsimas, D.; Brown, DB; Caramanis, C., Theory and applications of robust optimization, SIAM Rev., 53, 3, 464-501 (2011) · Zbl 1233.90259
[9] Bertsimas, D.; King, A.; Mazumder, R., Best subset selection via a modern optimization lens, Ann. Stat., 44, 2, 813-852 (2016) · Zbl 1335.62115
[10] Bickel, PJ; Levina, E., Covariance regularization by thresholding, Ann. Stat., 36, 6, 2577-2604 (2008) · Zbl 1196.62062
[11] Boyd, S.; Vandenberghe, L., Convex Optimization (2004), Cambridge: Cambridge University Press, Cambridge · Zbl 1058.90049
[12] Cai, T.; Liu, W.; Luo, X., A constrained \(\ell_1\) minimization approach to sparse precision matrix estimation, J. Am. Stat. Assoc., 106, 494, 594-607 (2011) · Zbl 1232.62087
[13] Çezik, MT; Iyengar, G., Cuts for mixed 0-1 conic programming, Math. Program., 104, 1, 179-202 (2005) · Zbl 1159.90457
[14] Chickering, D.M.: Learning Bayesian Networks is NP-Complete. In: Fisher, D., Lenz, HJ. (eds.) Learning from Data. Lecture Notes in Statistics, vol 112. Springer, New York, NY (1996)
[15] Chow, C.; Liu, C., Approximating discrete probability distributions with dependence trees, IEEE Trans. Inf. Theory, 14, 3, 462-467 (1968) · Zbl 0165.22305
[16] Dahl, J.; Vandenberghe, L.; Roychowdhury, V., Covariance selection for nonchordal graphs via chordal embedding, Optim. Methods Softw., 23, 4, 501-520 (2008) · Zbl 1151.90514
[17] Dempster, AP, Covariance selection, Biometrics, 28, 157-175 (1972)
[18] Drton, M.; Maathuis, MH, Structure learning in graphical modeling, Ann. Rev. Stat. Appl., 4, 365-393 (2017)
[19] Duran, MA; Grossmann, IE, An outer-approximation algorithm for a class of mixed-integer nonlinear programs, Math. Programm., 36, 3, 307-339 (1986) · Zbl 0619.90052
[20] El Karoui, N., High-dimensionality effects in the markowitz problem and other quadratic programs with linear constraints: risk underestimation, Ann. Stat., 38, 6, 3487-3566 (2010) · Zbl 1274.62365
[21] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., 96, 456, 1348-1360 (2001) · Zbl 1073.62547
[22] Fan, J.; Fan, Y.; Lv, J., High dimensional covariance matrix estimation using a factor model, J. Econom., 147, 1, 186-197 (2008) · Zbl 1429.62185
[23] Fan, J.; Feng, Y.; Yichao, W., Network exploration via the adaptive lasso and scad penalties, Ann. Appl. Stat., 3, 2, 521 (2009) · Zbl 1166.62040
[24] Fan, J.; Zhang, J.; Ke, Y., Vast portfolio selection with gross-exposure constraints, J. Am. Stat. Assoc., 107, 498, 592-606 (2012) · Zbl 1261.62091
[25] Fan, J.; Han, F.; Liu, H., Challenges of big data analysis, Natl. Sci. Rev., 1, 2, 293-314 (2014)
[26] Fan, J.; Liao, Y.; Liu, H., An overview of the estimation of large covariance and precision matrices, Econom. J., 19, 1, C1-C32 (2016)
[27] Fattahi, S., Sojoudi, S.: Graphical lasso and thresholding: equivalence and closed-form solutions (2017). arXiv:1708.09479 · Zbl 07049729
[28] Foygel, R., Drton, M.: Extended Bayesian information criteria for Gaussian graphical models. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 604-612. Curran Associates, Inc. (2010)
[29] Friedman, J.; Hastie, T.; Tibshirani, R., Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9, 3, 432-441 (2008) · Zbl 1143.62076
[30] Gally, T.; Pfetsch, ME; Ulbrich, S., A framework for solving mixed-integer semidefinite programs, Optim. Methods Softw., 33, 3, 594-632 (2018) · Zbl 1398.90109
[31] Geoffrion, AM, Generalized benders decomposition, J. Optim. Theory Appl., 10, 4, 237-260 (1972) · Zbl 0229.90024
[32] Gurobi Optimization Inc. Gurobi Optimizer Reference Manual (2015). http://www.gurobi.com
[33] Hastie, T.; Tibshirani, R.; Wainwright, M., Statistical Learning with Sparsity: The Lasso and Generalizations (2015), Boca Raton: CRC Press, Boca Raton · Zbl 1319.68003
[34] Helmberg, C.; Rendl, F., Solving quadratic (0, 1)-problems by semidefinite programs and cutting planes, Math. Program., 82, 3, 291-315 (1998) · Zbl 0919.90112
[35] Hess, KR; Keith Anderson, W.; Symmans, F.; Valero, V.; Ibrahim, N.; Mejia, JA; Booser, D.; Theriault, RL; Buzdar, AU; Dempsey, PJ, Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer, J. Clin. Oncol., 24, 26, 4236-4244 (2006)
[36] Hsieh, C.-J., Dhillon, I.S., Ravikumar, P.K., Sustik, M.A.: Sparse inverse covariance matrix estimation using quadratic approximation. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24, pp. 2330-2338. Curran Associates, Inc. (2011)
[37] Hsieh, C.-J., Sustik, M.A., Dhillon, I.S., Ravikumar, P.K., Poldrack, R.: Big & quic: sparse inverse covariance estimation for a million variables. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3165-3173. Curran Associates, Inc. (2013)
[38] IBM ILOG. Cplex optimizer (2012). http://www-01.ibm.com/software/commerce/optimization/cplex-optimizer
[39] Krishnamurthy, V., Ahipasaoglu, S.D., d’Aspremont, A.: A pathwise algorithm for covariance selection. Optim. Mach. Learn., p. 479 (2011). arXiv:0908.0143v2
[40] Lam, C.; Fan, J., Sparsistency and rates of convergence in large covariance matrix estimation, Ann. Stat., 37, 6, 4254 (2009) · Zbl 1191.62101
[41] Lauritzen, SL, Graphical Models (1996), Oxford: Clarendon Press, Oxford
[42] Liu, Z.; Lin, S.; Deng, N.; McGovern, DPB; Piantadosi, S., Sparse inverse covariance estimation with \(\ell_0\) penalty for network construction with omics data, J. Comput. Biol., 23, 3, 192-202 (2016)
[43] Lubin, M.; Dunning, I., Computing in operations research using julia, INFORMS J. Comput., 27, 2, 238-248 (2015) · Zbl 1331.90001
[44] Lubin, M.; Yamangil, E.; Bent, R.; Vielma, JP, Polyhedral approximation in mixed-integer convex optimization, Math. Program., 172, 1-2, 139-168 (2018) · Zbl 1401.90158
[45] Ma, J.; Zhao, F.; Xu, J.; Meila, M.; Heskes, T., Structure learning constrained by node-specific degree distribution, Uncertainty in Artificial Intelligence 33, 533-541 (2015), Oregon: AUAI Press Corvalis, Oregon
[46] Marjanovic, G.; Hero, AO, \( \ell_0\) sparse inverse covariance estimation, IEEE Trans. Signal Process., 63, 12, 3218-3231 (2015) · Zbl 1394.62069
[47] Kipp Martin, R., Using separation algorithms to generate mixed integer model reformulations, Oper. Res. Lett., 10, 3, 119-128 (1991) · Zbl 0747.90071
[48] Mazumder, R.; Hastie, T., Exact covariance thresholding into connected components for large-scale graphical lasso, J. Mach. Learn. Res., 13, Mar, 781-794 (2012) · Zbl 1283.62148
[49] Mazumder, R.; Hastie, T., The graphical lasso: new insights and alternatives, Electron. J. Stat., 6, 2125 (2012) · Zbl 1295.62066
[50] Meinshausen, N.; Bühlmann, P., High-dimensional graphs and variable selection with the lasso, Ann. Stat., 34, 3, 1436-1462 (2006) · Zbl 1113.62082
[51] Meyer, C.D.: Matrix analysis and applied linear algebra, vol. 71. Siam (2000)
[52] Oztoprak, F., Nocedal, J., Rennie, S., Olsen, P.A.: Newton-like methods for sparse inverse covariance estimation. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 755-763. Curran Associates, Inc. (2012)
[53] Rendl, F.; Rinaldi, G.; Wiegele, A., Solving max-cut to optimality by intersecting semidefinite and polyhedral relaxations, Math. Program., 121, 2, 307 (2010) · Zbl 1184.90118
[54] Rigollet, P., Tsybakov, A.: Estimation of covariance matrices under sparsity constraints (2012). arXiv:1205.1210 · Zbl 1295.62057
[55] Rothman, AJ; Bickel, PJ; Levina, E.; Zhu, J., Sparse permutation invariant covariance estimation, Electron. J. Stat., 2, 494-515 (2008) · Zbl 1320.62135
[56] Santhanam, NP; Wainwright, MJ, Information-theoretic limits of selecting binary graphical models in high dimensions, IEEE Trans. Inf. Theory, 58, 7, 4117-4134 (2012) · Zbl 1365.62212
[57] Scheinberg, K., Rish, I.: SINCO-a greedy coordinate ascent method for sparse inverse covariance selection problem. IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, July (2009)
[58] Scheinberg, K., Ma, S., Goldfarb, D.: Sparse inverse covariance selection via alternating linearization methods. In: Advances in Neural Information Processing Systems, pp. 2101-2109 (2010)
[59] Shao, J., Linear model selection by cross-validation, J. Am. Stat. Assoc., 88, 422, 486-494 (1993) · Zbl 0773.62051
[60] Sotirov, R.: SDP relaxations for some combinatorial optimization problems. In: Handbook on Semidefinite, Conic and Polynomial Optimization, pp. 795-819. Springer (2012) · Zbl 1334.90116
[61] Tan, KM; London, P.; Mohan, K.; Lee, S-I; Fazel, M.; Witten, D., Learning graphical models with hubs, J. Mach. Learn. Res., 15, 1, 3297-3331 (2014) · Zbl 1318.68155
[62] Tokuda, T., Goodrich, B., Van Mechelen, I., Gelman, A., Tuerlinckx, F.: Visualizing distributions of covariance matrices. Columbia University, New York, USA, Technical Report, p. 18 (2011)
[63] Vandenberghe, L.; Andersen, MS, Chordal graphs and semidefinite optimization, Found. Trends Optim., 1, 4, 241-433 (2015)
[64] Xu, H., Caramanis, C., Mannor, S.: Robust regression and lasso. In: Advances in Neural Information Processing Systems, pp. 1801-1808 (2009) · Zbl 1366.62147
[65] Yonekura, K.; Kanno, Y., Global optimization of robust truss topology via mixed integer semidefinite programming, Optim. Eng., 11, 3, 355-379 (2010) · Zbl 1273.74393
[66] Yuan, M., High dimensional inverse covariance matrix estimation via linear programming, J. Mach. Learn. Res., 11, Aug, 2261-2286 (2010) · Zbl 1242.62043
[67] Yuan, M.; Lin, Y., Model selection and estimation in the gaussian graphical model, Biometrika, 94, 1, 19-35 (2007) · Zbl 1142.62408
[68] Zhang, C-H, Nearly unbiased variable selection under minimax concave penalty, Ann. Stat., 38, 2, 894-942 (2010) · Zbl 1183.62120
[69] Zhang, F., The Schur Complement and Its Applications (2006), Berlin: Springer, Berlin
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.