Zeng, Lingmin; Xie, Jun Group variable selection for data with dependent structures. (English) Zbl 1431.62296 J. Stat. Comput. Simulation 82, No. 1, 95-106 (2012). Summary: Variable selection methods have been widely used in the analysis of high-dimensional data, for example, gene expression microarray data and single nucleotide polymorphism data. A special feature of the genomic data is that genes participating in a common metabolic pathway or sharing a similar biological function tend to have high correlations. The collinearity naturally embedded in these data requires special handling, which cannot be provided by existing variable selection methods. In this paper, we propose a set of new methods to select variables in correlated data. The new methods follow the forward selection procedure of least angle regression (LARS) but conduct grouping and selecting at the same time. The methods specially work when no prior information on group structures of data is available. Simulations and real examples show that our proposed methods often outperform the existing variable selection methods, including LARS and elastic net, in terms of both reducing prediction error and preserving sparsity of representation. Cited in 7 Documents MSC: 62J05 Linear regression; mixed models 62J07 Ridge regression; shrinkage estimators (Lasso) Keywords:hard thresholding; least angle regression; variable selection Software:OSCAR PDFBibTeX XMLCite \textit{L. Zeng} and \textit{J. Xie}, J. Stat. Comput. Simulation 82, No. 1, 95--106 (2012; Zbl 1431.62296) Full Text: DOI References: [1] DOI: 10.1089/106652703322756177 · doi:10.1089/106652703322756177 [2] Tibshirani R., JRSS B 58 pp 267– (1996) [3] DOI: 10.1214/009053604000000067 · Zbl 1091.62054 · doi:10.1214/009053604000000067 [4] DOI: 10.1111/j.1467-9868.2005.00503.x · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x [5] DOI: 10.1111/j.1541-0420.2007.00843.x · Zbl 1146.62051 · doi:10.1111/j.1541-0420.2007.00843.x [6] DOI: 10.1093/biostatistics/kxl002 · Zbl 1144.62357 · doi:10.1093/biostatistics/kxl002 [7] Yuan M., JRSS B 68 pp 49– (2006) [8] DOI: 10.1186/1471-2105-8-60 · doi:10.1186/1471-2105-8-60 [9] Park, M. Y. and Hastie, T. 2006. ”Regularization path algorithms for detecting gene interactions”. Department of Statistics, Standford University. Technical Report [10] DOI: 10.1198/016214506000000735 · Zbl 1171.62326 · doi:10.1198/016214506000000735 [11] Daye, Z. J. and Jeng, X. J. 2007. ”Shrinkage and model selection with correlated variables via weighted fusion”. Department of Statistics, Purdue University. Technical Report · Zbl 1452.62049 [12] Stamey T., J. Urol. 141 pp 1076– (1989) [13] DOI: 10.1126/science.280.5366.1077 · doi:10.1126/science.280.5366.1077 [14] DOI: 10.1038/nature02797 · doi:10.1038/nature02797 [15] Knight K., Technometrics 12 pp 69– (2000) [16] Zhao P., J. Mach. Learn. Res. 7 pp 2541– (2006) [17] DOI: 10.1214/07-AOS520 · Zbl 1142.62044 · doi:10.1214/07-AOS520 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.