×

Sparsity with sign-coherent groups of variables via the cooperative-Lasso. (English) Zbl 1243.62101

Summary: We consider the problems of estimation and selection of parameters endowed with a known group structure, when the groups are assumed to be sign-coherent, that is, gathering either non-negative, non-positive or null parameters. To tackle this problem, we propose the cooperative-Lasso penalty. We derive optimality conditions defining the cooperative-Lasso estimate for generalized linear models, and propose an efficient active set algorithm suited to high-dimensional problems. We study the asymptotic consistency of the estimator in the linear regression setup and derive its irrepresentable conditions, which are milder than the ones of the group-Lasso regarding the matching of groups with the sparsity pattern of the true parameters. We also address the problem of model selection in linear regression by deriving an approximation of the degrees of freedom of the cooperative-Lasso estimator. Simulations comparing the proposed estimator to the group and sparse group-Lasso comply with our theoretical results, showing consistent improvements in support recovery for sign-coherent groups. We finally propose two examples illustrating the wide applicability of the cooperative-Lasso: first to the processing of ordinal variables, where the penalty acts as a monotonicity prior; second to the processing of genomic data, where the set of differentially expressed probes is enriched by incorporating all the probes of the microarray that are related to the corresponding genes.

MSC:

62J05 Linear regression; mixed models
62J12 Generalized linear models (logistic models)
62F12 Asymptotic properties of parametric estimators
65C60 Computational problems in statistics (MSC2010)
62A09 Graphical methods in statistics

Software:

UCI-ml

References:

[1] Bach, F. R. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9 1179-1225. · Zbl 1225.68147
[2] Bakin, S. (1999). Adaptive regression and model selection in data mining problems. Ph.D. thesis, Australian National Univ., Canberra.
[3] Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183-202. · Zbl 1175.94009 · doi:10.1137/080716542
[4] Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37 373-384. · Zbl 0862.62059 · doi:10.2307/1269730
[5] Breiman, L. (1996). Heuristics of instability and stabilization in model selection. Ann. Statist. 24 2350-2383. · Zbl 0867.62055 · doi:10.1214/aos/1032181158
[6] Breiman, L., Friedman, J. H., Olshen, R. and Stone, C. J. (1984). Classification and Regression Trees . Wadsworth, Belmont, CA. · Zbl 0541.62042
[7] Chiquet, J., Grandvalet, Y. and Ambroise, C. (2011). Inferring multiple graphical structures. Statistic and Computing 21 537-553. · Zbl 1221.62085 · doi:10.1007/s11222-010-9191-2
[8] Efron, B. (2004). The estimation of prediction error: Covariance penalties and cross-validation. J. Amer. Statist. Assoc. 99 619-642. · Zbl 1117.62324 · doi:10.1198/016214504000000692
[9] Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95 14863-14868.
[10] Foygel, R. and Drton, M. (2010). Exact block-wise optimization in group lasso for linear regression. Technical report. Available at . 1010.3320
[11] Frank, A. and Asuncion, A. (2010). UCI machine learning repository.
[12] Friedman, J., Hastie, T. and Tibshirani, R. (2010). A note on the group Lasso and a sparse group Lasso. Technical report. Available at . 1001.0736
[13] Gertheiss, J. and Tutz, G. (2009). Penalized regression with ordinal predictors. International Statistical Review 77 345-365.
[14] Gertheiss, J. and Tutz, G. (2010). Sparse modeling of categorial explanatory variables. Ann. Appl. Stat. 4 2150-2180. · Zbl 1220.62092 · doi:10.1214/10-AOAS355
[15] Grandvalet, Y. and Canu, S. (1999). Outcomes of the equivalence of adaptive ridge with least absolute shrinkage. In Advances in Neural Information Processing Systems 11 ( NIPS 1998) 445-451.
[16] Hess, K. R., Anderson, K., Symmans, W. F., Valero, V., Ibrahim, N., Mejia, J. A., Booser, D., Theriault, R. L., Buzdar, U., Dempsey, P. J., Rouzier, R., Sneige, N., Ross, J. S., Vidaurre, T., Gómez, H. L., Hortobagyi, G. N. and Pustzai, L. (2006). Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with Paclitaxel and Fluorouracil, Doxorubicin, and Cyclophosphamide in breast cancer. Journal of Clinical Oncology 24 4236-4244.
[17] Hesterberg, T., Choi, N. H., Meier, L. and Fraley, C. (2008). Least angle and \(l_{1}\) penalized regression: A review. Stat. Surv. 2 61-93. · Zbl 1189.62070 · doi:10.1214/08-SS035
[18] Huang, J. and Zhang, T. (2010). The benefit of group sparsity. Ann. Statist. 38 1978-2004. · Zbl 1202.62052 · doi:10.1214/09-AOS778
[19] Jeanmougin, M., Guedj, M. and Ambroise, C. (2011). Defining a robust biological prior from pathway analysis to drive network inference. J. SFdS 152 97-110. · Zbl 1316.92050
[20] Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356-1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397
[21] Ma, S., Song, X. and Huang, J. (2007). Supervised group Lasso with applications to microarray data analysis. BMC Bioinformatics 8 60.
[22] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 53-71. · Zbl 1400.62276 · doi:10.1111/j.1467-9868.2007.00627.x
[23] Nardi, Y. and Rinaldo, A. (2008). On the asymptotic properties of the group lasso estimator for linear models. Electron. J. Stat. 2 605-633. · Zbl 1320.62167 · doi:10.1214/08-EJS200
[24] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319-337.
[25] Park, M. Y., Hastie, T. and Tibshirani, R. (2007). Averaged gene expressions for regression. Biostatistics 8 212-227. · Zbl 1144.62357 · doi:10.1093/biostatistics/kxl002
[26] Roth, V. and Fischer, B. (2008). The group-Lasso for generalized linear models: Uniqueness of solutions and efficient algorithms. In ICML’ 08: Proceedings of the 25 th International Conference on Machine Learning 848-855.
[27] Rufibach, K. (2010). An active set algorithm to estimate parameters in generalized linear models with ordered predictors. Comput. Statist. Data Anal. 54 1442-1456. · Zbl 1284.62466
[28] Serlin, R. C. and Levin, J. R. (1985). Teaching how to derive directly interpretable coding schemes for multiple regression analysis. Journal of Educational Statistics 10 223-238.
[29] Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135-1151. · Zbl 0476.62035 · doi:10.1214/aos/1176345632
[30] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[31] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3 . Cambridge Univ. Press, Cambridge. · Zbl 0910.62001 · doi:10.1017/CBO9780511802256
[32] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[33] Yuan, M. and Lin, Y. (2007). On the non-negative garrote estimator. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 143-161. · Zbl 1120.62052 · doi:10.1111/j.1467-9868.2007.00581.x
[34] Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468-3497. · Zbl 1369.62164 · doi:10.1214/07-AOS584
[35] Zhou, H., Sehl, M. E., Sinsheimer, J. S. and Lange, K. (2010). Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26 2375-2382. · Zbl 1103.68978 · doi:10.1007/s11741-003-0012-0
[36] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301-320. · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
[37] Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist. 35 2173-2192. · Zbl 1126.62061 · doi:10.1214/009053607000000127
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.