×

Simultaneous grouping pursuit and feature selection over an undirected graph. (English) Zbl 06195973

Summary: In high-dimensional regression, grouping pursuit and feature selection have their own merits while complementing each other in battling the curse of dimensionality. To seek a parsimonious model, we perform simultaneous grouping pursuit and feature selection over an arbitrary undirected graph with each node corresponding to one predictor. When the corresponding nodes are reachable from each other over the graph, regression coefficients can be grouped, whose absolute values are the same or close. This is motivated from gene network analysis, where genes tend to work in groups according to their biological functionalities. Through a nonconvex penalty, we develop a computational strategy and analyze the proposed method. Theoretical analysis indicates that the proposed method reconstructs the oracle estimator, that is, the unbiased least-square estimator given the true grouping, leading to consistent reconstruction of grouping structures and informative features, as well as to optimal parameter estimation. Simulation studies suggest that the method combines the benefit of grouping pursuit with that of feature selection, and compares favorably against its competitors in selection accuracy and predictive performance. An application to eQTL data is used to illustrate the methodology, where a network is incorporated into analysis through an undirected graph.

MSC:

62-XX Statistics

References:

[1] An L. T. H., Annals of Operations Research 133 pp 23– (2005) · Zbl 1116.90122 · doi:10.1007/s10479-004-5022-1
[2] Bondell H. D., Biometrics 64 pp 115– (2008) · Zbl 1146.62051 · doi:10.1111/j.1541-0420.2007.00843.x
[3] Bottolo L., Genetics 189 pp 1449– (2011) · doi:10.1534/genetics.111.131425
[4] Chartrand G., Introductory Graph Theory (1985)
[5] DOI: 10.1371/journal.pgen.1000587. · doi:10.1371/journal.pgen.1000587
[6] Li C., The Annals of Applied Statistics 4 pp 1498– (2010) · Zbl 1202.62157 · doi:10.1214/10-AOAS332
[7] DOI: 10.1371/journal.pgen.1000888. · doi:10.1371/journal.pgen.1000888
[8] Pan W., Biometrics 66 pp 474– (2010) · Zbl 1192.62235 · doi:10.1111/j.1541-0420.2009.01296.x
[9] Rinaldo A., The Annals of Statistics 37 pp 2922– (2009) · Zbl 1173.62027 · doi:10.1214/08-AOS665
[10] Scherzer C. R., Proceedings of the National Academy of Sciences 104 pp 955– (2007) · doi:10.1073/pnas.0610204104
[11] Shen X., Journal of the American Statistical Association 105 pp 727– (2010) · Zbl 1392.62192 · doi:10.1198/jasa.2010.tm09380
[12] Shen X., Journal of the American Statistical Association 107 pp 223– (2012) · Zbl 1261.62020 · doi:10.1080/01621459.2011.645783
[13] Shen X., Annals of the Institute of Statistical Mathematics 1 pp 1– (2013)
[14] Shen X., Journal of the American Statistical Association 97 pp 210– (2002) · Zbl 1073.62509 · doi:10.1198/016214502753479356
[15] Tibshirani R., Journal of the Royal Statistical Society, Series B 67 pp 91– (2005) · Zbl 1060.62049 · doi:10.1111/j.1467-9868.2005.00490.x
[16] DOI: 10.1371/journal.pgen.1000214. · doi:10.1371/journal.pgen.1000214
[17] Yuan M., Journal of the Royal Statistical Society, Series B 68 pp 49– (2006) · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[18] Zhang C.-H., The Annals of Statistics 38 pp 894– (2010) · Zbl 1183.62120 · doi:10.1214/09-AOS729
[19] DOI: 10.1371/journal.pcbi.1000642. · doi:10.1371/journal.pcbi.1000642
[20] Zhao P., The Annals of Statistics 37 pp 3468– (2009) · Zbl 1369.62164 · doi:10.1214/07-AOS584
[21] Zhong H., The American Journal of Human Genetics 86 pp 581– (2010) · doi:10.1016/j.ajhg.2010.02.020
[22] Zhou X., Proceedings of the National Academy of Sciences of USA 99 pp 12783– (2002) · doi:10.1073/pnas.192159399
[23] Zou H., Journal of the American Statistical Association 101 pp 1418– (2006) · Zbl 1171.62326 · doi:10.1198/016214506000000735
[24] Zou H., Journal of the Royal Statistical Society, Series B 67 pp 301– (2005) · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.