×

zbMATH — the first resource for mathematics

The sparse Laplacian shrinkage estimator for high-dimensional regression. (English) Zbl 1227.62049
Summary: We propose a new penalized method for variable selection and estimation that explicitly incorporates the correlation patterns among predictors. This method is based on a combination of the minimax concave penalty and Laplacian quadratic associated with a graph as the penalty function. We call it the sparse Laplacian shrinkage (SLS) method. The SLS uses the minimax concave penalty for encouraging sparsity and Laplacian quadratic penalty for promoting smoothness among coefficients associated with the correlated predictors. The SLS has a generalized grouping property with respect to the graph represented by the Laplacian quadratic. We show that the SLS possesses an oracle property in the sense that it is selection consistent and equal to the oracle Laplacian shrinkage estimator with high probability. This result holds in sparse, high-dimensional settings with \(p \gg n\) under reasonable conditions. We derive a coordinate descent algorithm for computing the SLS estimates. Simulation studies are conducted to evaluate the performance of the SLS method and a real data example is used to illustrate its application.

MSC:
62J07 Ridge regression; shrinkage estimators (Lasso)
62J05 Linear regression; mixed models
62H12 Estimation in multivariate analysis
65C60 Computational problems in statistics (MSC2010)
Software:
PDCO; OSCAR; glasso; sparsenet
PDF BibTeX XML Cite
Full Text: DOI arXiv
References:
[1] Bolstad, B. M., Irizarry, R. A., Astrand, M. and Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19 185-193.
[2] Bondell, H. D. and Reich, B. J. (2008). Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64 115-123, 322-323. · Zbl 1146.62051
[3] Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression methods. Ann. Appl. Stat. 5 232-253. · Zbl 1220.62095
[4] Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33-61. · Zbl 0919.94002
[5] Chiang, A. P., Beck, J. S., Yen, H. J., Tayeh, M. K., Scheetz, T. E., Swiderski, R., Nishimura, D., Braun, T. A., Kim, K. Y., Huang, J., Elbedour, K., Carmi, R., Slusarski, D. C., Casavant, T. L., Stone, E. M. and Sheffield, V. C. (2006). Homozygosity mapping with SNP arrays identifies a novel gene for Bardet-Biedl Syndrome (BBS10). Proc. Natl. Acad. Sci. USA 103 6287-6292.
[6] Chung, F. R. K. (1997). Spectral Graph Theory. CBMS Regional Conference Series in Mathematics 92 . Conf. Board Math. Sci., Washington, DC. · Zbl 0867.05046
[7] Chung, F. and Lu, L. (2006). Complex Graphs and Networks. CBMS Regional Conference Series in Mathematics 107 . Conf. Board Math. Sci., Washington, DC. · Zbl 1114.90071
[8] Daye, Z. J. and Jeng, X. J. (2009). Shrinkage and model selection with correlated variables via weighted fusion. Comput. Statist. Data Anal. 53 1284-1298. · Zbl 1452.62049
[9] Fan, J. (1997). Comments on “Wavelets in statistics: A review” by A. Antoniadis. J. Italian Statist. Assoc. 6 131-138.
[10] Fan, J., Feng, Y. and Wu, Y. (2009). Network exploration via the adaptive lasso and SCAD penalties. Ann. Appl. Stat. 3 521-541. · Zbl 1166.62040
[11] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547
[12] Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics 35 109-148. · Zbl 0775.62288
[13] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatist. 9 432-441. · Zbl 1143.62076
[14] Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302-332. · Zbl 1378.90064
[15] Fu, W. J. (1998). Penalized regressions: The bridge versus the lasso. J. Comput. Graph. Statist. 7 397-416.
[16] Genkin, A., Lewis, D. D. and Madigan, D. (2004). Large-scale Bayesian logistic regression for text categorization. Technical report, DIMACS, Rutgers Univ.
[17] Hebiri, M. and van de Geer, S. (2010). The smooth-Lasso and other \ell 1 + \ell 2 -penalized methods. Preprint. Available at . · Zbl 1274.62443
[18] Huang, J., Breheny, P., Ma, S. and Zhang, C. H. (2010a). The Mnet method for variable selection. Technical Report # 402, Dept. Statistics and Actuarial Science, Univ. Iowa. · Zbl 1356.62091
[19] Huang, J., Ma, S., Li, H. and Zhang, C. H. (2010b). The sparse Laplacian shrinkage estimator for high-dimensional regression. Technical Report # 403, Dept. Statistics and Actuarial Science, Univ. Iowa. · Zbl 1227.62049
[20] Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U. and Speed, T. P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatist. 4 249-264. · Zbl 1141.62348
[21] Jia, J. and Yu, B. (2010). On model selection consistency of the elastic net when p \gg n. Statist. Sinica 20 595-611. · Zbl 1187.62125
[22] Li, C. and Li, H. (2008). Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24 1175-1182. · Zbl 1022.68519
[23] Li, C. and Li, H. (2010). Variable selection and regression analysis for covariates with graphical structure. Ann. Appl. Stat. 4 1498-1516. · Zbl 1202.62157
[24] Mazumder, R., Friedman, J. and Hastie, T. (2009). SparseNet: Coordinate descent with non-convex penalties. Technical report, Dept. Statistics, Stanford Univ. · Zbl 1229.62091
[25] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082
[26] Pan, W., Xie, B. and Shen, X. (2011). Incorporating predictor network in penalized regression with application to microarray data. Biometrics . · Zbl 1192.62235
[27] Scheetz, T. E., Kim, K. Y. A., Swiderski, R. E., Philp, A. R., Braun, T. A., Knudtson, K. L., Dorrance, A. M., DiBona, G. F., Huang, J., Casavant, T. L., Sheffield, V. C. and Stone, E. M. (2006). Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc. Natl. Acad. Sci. USA 103 14429-14434.
[28] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267-288. · Zbl 0850.62538
[29] Tutz, G. and Ulbricht, J. (2009). Penalized regression with correlation-based penalty. Stat. Comput. 19 239-253.
[30] Wu, T. T. and Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2 224-244. · Zbl 1137.62045
[31] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030
[32] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19-35. · Zbl 1142.62408
[33] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894-942. · Zbl 1183.62120
[34] Zhang, B. and Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4 45 pp. (electronic). · Zbl 1077.92042
[35] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. · Zbl 1142.62044
[36] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301-320. · Zbl 1069.62054
[37] Zou, H. and Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. Ann. Statist. 37 1733-1751. · Zbl 1168.62064
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.