×

Group variable selection for data with dependent structures. (English) Zbl 1431.62296

Summary: Variable selection methods have been widely used in the analysis of high-dimensional data, for example, gene expression microarray data and single nucleotide polymorphism data. A special feature of the genomic data is that genes participating in a common metabolic pathway or sharing a similar biological function tend to have high correlations. The collinearity naturally embedded in these data requires special handling, which cannot be provided by existing variable selection methods. In this paper, we propose a set of new methods to select variables in correlated data. The new methods follow the forward selection procedure of least angle regression (LARS) but conduct grouping and selecting at the same time. The methods specially work when no prior information on group structures of data is available. Simulations and real examples show that our proposed methods often outperform the existing variable selection methods, including LARS and elastic net, in terms of both reducing prediction error and preserving sparsity of representation.

MSC:

62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)

Software:

OSCAR
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] DOI: 10.1089/106652703322756177 · doi:10.1089/106652703322756177
[2] Tibshirani R., JRSS B 58 pp 267– (1996)
[3] DOI: 10.1214/009053604000000067 · Zbl 1091.62054 · doi:10.1214/009053604000000067
[4] DOI: 10.1111/j.1467-9868.2005.00503.x · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
[5] DOI: 10.1111/j.1541-0420.2007.00843.x · Zbl 1146.62051 · doi:10.1111/j.1541-0420.2007.00843.x
[6] DOI: 10.1093/biostatistics/kxl002 · Zbl 1144.62357 · doi:10.1093/biostatistics/kxl002
[7] Yuan M., JRSS B 68 pp 49– (2006)
[8] DOI: 10.1186/1471-2105-8-60 · doi:10.1186/1471-2105-8-60
[9] Park, M. Y. and Hastie, T. 2006. ”Regularization path algorithms for detecting gene interactions”. Department of Statistics, Standford University. Technical Report
[10] DOI: 10.1198/016214506000000735 · Zbl 1171.62326 · doi:10.1198/016214506000000735
[11] Daye, Z. J. and Jeng, X. J. 2007. ”Shrinkage and model selection with correlated variables via weighted fusion”. Department of Statistics, Purdue University. Technical Report · Zbl 1452.62049
[12] Stamey T., J. Urol. 141 pp 1076– (1989)
[13] DOI: 10.1126/science.280.5366.1077 · doi:10.1126/science.280.5366.1077
[14] DOI: 10.1038/nature02797 · doi:10.1038/nature02797
[15] Knight K., Technometrics 12 pp 69– (2000)
[16] Zhao P., J. Mach. Learn. Res. 7 pp 2541– (2006)
[17] DOI: 10.1214/07-AOS520 · Zbl 1142.62044 · doi:10.1214/07-AOS520
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.