×

Sparse regression with exact clustering. (English) Zbl 1329.62327

Summary: This paper studies a generic sparse regression problem with a customizable sparsity pattern matrix, motivated by, but not limited to, a supervised gene clustering problem in microarray data analysis. The clustered lasso method is proposed with the \(l_{1}\)-type penalties imposed on both the coefficients and their pairwise differences. Somewhat surprisingly, it behaves differently than the lasso or the fused lasso – the exact clustering effect expected from the \(l_{1}\) penalization is rarely seen in applications. An asymptotic study is performed to investigate the power and limitations of the \(l_{1}\)-penalty in sparse regression. We propose to combine data-augmentation and weights to improve the \(l_{1}\) technique. To address the computational issues in high dimensions, we successfully generalize a popular iterative algorithm both in practice and in theory and propose an ‘annealing’ algorithm applicable to generic sparse regressions (including the fused/clustered lasso). Some effective accelerating techniques are further investigated to boost the convergence. The accelerated annealing (AA) algorithm, involving only matrix multiplications and thresholdings, can handle a large design matrix as well as a large sparsity pattern matrix.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62H30 Classification and discrimination; cluster analysis (statistical aspects)

References:

[1] Amaldi, E. and Kann, V. (1998), “On the approximability of minimizing non zero variables or unsatisfied relations in linear systems,”, Theoretical Computer Science , 209, 237-260. · Zbl 0915.68072 · doi:10.1016/S0304-3975(97)00115-1
[2] Anderson, T. (1971), The Statistical Analysis of Time Series , New York: Wiley. · Zbl 0225.62108
[3] Bazaraa, M. S. and Shetty, C. M. (1979), Nonlinear Programming: Theory and Algorithms , John Wiley & Sons. · Zbl 0476.90035
[4] Bertsekas, D. (1999), Nonlinear Programming , Athena Scientific. · Zbl 1015.90077
[5] Bondell, H. D. and Reich, B. J. (2008), “Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR,”, Biometrics , 64, 115-123. · Zbl 1146.62051 · doi:10.1111/j.1541-0420.2007.00843.x
[6] Boyd, S. and Vandenberghe, L. (2004), Convex Optimization , Cambridge, MA: Cambridge University Press. · Zbl 1058.90049
[7] Browder, F. E. and Petryshyn, W. V. (1967), “Construction of fixed points of nonlinear mappings in Hilbert space,”, Journal of Mathematical Analysis and Applications , 20, 197-228. · Zbl 0153.45701 · doi:10.1016/0022-247X(67)90085-6
[8] Bunea, F., Tsybakov, A. B., and Wegkamp, M. (2007), “Sparsity oracle inequalities for the lasso,”, Electronic Journal of Statistics , 1, 169-194. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[9] Candès, E. and Tao, T. (2005), “The Dantzig selector: statistical estimation when p is much smaller than n,”, Annals of Statistics , 35, 2392-2404. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[10] Daubechies, I., Defrise, M., and De Mol, C. (2004), “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,”, Communications on Pure and Applied Mathematics , 57, 1413-1457. · Zbl 1077.65055 · doi:10.1002/cpa.20042
[11] Dettling, M. and Bühlmann, P. (2004), “Finding Predictive Gene Groups from Microarray Data,”, Journal of Multivariate Analysis , 90, 106-131. · Zbl 1047.62103 · doi:10.1016/j.jmva.2004.02.012
[12] Dotson Jr., W. (1970), “On the Mann iterative process,”, Trans. Amer. Math. Soc. , 149, 65-73. · Zbl 0203.14801 · doi:10.2307/1995659
[13] Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004), “Least Angle Regression,”, Annals of Statistics , 32, 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[14] Friedman, J., Hastie, T., Hofling, H., and Tibshirani, R. (2007), “Pathwise coordinate optimization,”, Annals of Applied Statistics , 1, 302-332. · Zbl 1378.90064 · doi:10.1214/07-AOAS131
[15] Fu, W. (1998), “Penalized regressions: the bridge vs the lasso,”, JCGS , 7, 397-416.
[16] Geyer, C. J. (1996), “On the Asymptotics of Convex Stochastic, Optimization”.
[17] Jörnsten, R. and Yu, B. (2003), “Simultaneous Gene Clustering and Subset Selection,”, Bioinformatics , 19, 1100-1109.
[18] Knight, K. and Fu, W. (2000), “Asymptotics for lasso-type estimators,”, Annals of Statistics , 28, 1356-1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397
[19] Mann, W. R. (1953), “Mean value methods in iteration,”, Proc. Amer. Math. Soc. , 4, 506-510. · Zbl 0050.11603 · doi:10.2307/2032162
[20] Nolte, A. and Schrader, R. (2000), “A note on the finite time behavior of simulated annealing,”, Math. Operat. Res. , 25, 476-484. · Zbl 1073.90579 · doi:10.1287/moor.25.3.476.12211
[21] Opial, Z. (1967), “Weak convergence of the sequence of successive approximations for nonexpansive mappings,”, Bull. Amer. Math. Soc. , 73, 591-597. · Zbl 0179.19902 · doi:10.1090/S0002-9904-1967-11761-0
[22] Osborne, M., Presnell, B., and Turlach, B. (2000), “On the LASSO and its Dual,”, J. Comput. Graph. Statist. , 9, 319-337.
[23] She, Y. (2008), “Sparse Regression with Exact Clustering,” Ph.D. thesis, Stanford, University. · Zbl 1329.62327
[24] She, Y. (2009), “Thresholding-based Iterative Selection Procedures for Model Selection and Shrinkage,”, Electronic Journal of Statistics , 3, 384-415. · Zbl 1326.62158 · doi:10.1214/08-EJS348
[25] Shimizu, K., Ishizuka, Y., and Bard, J. (1997), Nondifferentiable and Two-Level Mathematical Programming , Kluwer Academic Publishers. · Zbl 0878.90088
[26] Tibshirani, R. (1996), “Regression Shrinkage and Selection via the Lasso,”, JRSSB , 58, 267-288. · Zbl 0850.62538
[27] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2005), “Sparsity and smoothness via the fused lasso,”, JRSSB , 67, 91-108. · Zbl 1060.62049 · doi:10.1111/j.1467-9868.2005.00490.x
[28] Wrinkler, G. (1990), “An Ergodic, L 2 -Theorem for Simulated Annealing in Bayesian Image Reconstruction,” Journal of Applied Probability , 28, 779-791. · Zbl 0721.60038 · doi:10.2307/3214822
[29] Wu, T. and Lange, K. (2008), “Coordinate Descent Algorithm for Lasso Penalized Regression,”, Ann. Appl. Stat. , 2, 224-244. · Zbl 1137.62045 · doi:10.1214/07-AOAS174SUPP
[30] Yuan, M. and Lin, Y. (2006), “Model selection and estimation in regression with grouped variables,”, JRSSB , 68, 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[31] Zhang, C.-H. and Huang, J. (2008), “The sparsity and bias of the Lasso selection in high-dimensional linear regression,”, Ann. Statist , 36, 1567-1594. · Zbl 1142.62044 · doi:10.1214/07-AOS520
[32] Zhao, P. and Yu, B. (2006a), “Grouped and Hierarchical Model Selection through Composite Absolute Penalties,” Tech. rep., Dept. of Statistics, University of California, Berkeley.
[33] Zhao, P. and Yu, B. (2006b), “On Model Selection Consistency of Lasso,”, Journal of Machine Learning Research , 7, 2541-2563. · Zbl 1222.62008
[34] Zou, H. (2006), “The Adaptive Lasso and Its Oracle Properties,”, JASA , 101, 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
[35] Zou, H. and Hastie, T. (2005), “Regularization and Variable Selection via the Elastic Net,”, JRSSB , 67, 301-320. · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
[36] Zou, H. and Li, R. (2008), “One-step Sparse Estimates in Nonconcave Penalized Likelihood Models,”, Annals of Statistics , 1509-1533. · Zbl 1142.62027 · doi:10.1214/009053607000000802
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.