×

Group regularized estimation under structural hierarchy. (English) Zbl 1398.62138

Summary: Variable selection for models including interactions between explanatory variables often needs to obey certain hierarchical constraints. Weak or strong structural hierarchy requires that the existence of an interaction term implies at least one or both associated main effects to be present in the model. Lately, this problem has attracted a lot of attention, but existing computational algorithms converge slow even with a moderate number of predictors. Moreover, in contrast to the rich literature on ordinary variable selection, there is a lack of statistical theory to show reasonably low error rates of hierarchical variable selection. This work investigates a new class of estimators that make use of multiple group penalties to capture structural parsimony. We show that the proposed estimators enjoy sharp rate oracle inequalities, and give the minimax lower bounds in strong and weak hierarchical variable selection. A general-purpose algorithm is developed with guaranteed convergence and global optimality. Simulations and real data experiments demonstrate the efficiency and efficacy of the proposed approach.

MSC:

62H12 Estimation in multivariate analysis
62J02 General nonlinear regression
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bauschke, H. H.; Combettes, P. L., A dykstra-like algorithm for two monotone operators, Pacific Journal of Optimization, 4, 383-391, (2008) · Zbl 1176.47051
[2] Bickel, P. J.; Ritov, Y.; Tsybakov, A. B., Simultaneous analysis of lasso and Dantzig selector, The Annals of Statistics, 37, 1705-1732, (2009) · Zbl 1173.62022
[3] Bickel, P. J.; Ritov, Y.; Tsybakov, A. B., Borrowing Strength: Theory Powering Applications-A Festschrift for Lawrence D. Brown, Hierarchical selection of variables in sparse high-dimensional regression, 56-69, (2010), Institute of Mathematical Statistics, OH
[4] Bien, J.; Bunea, F.; Xiao, L., Convex banding of the covariance matrix, Journal of the American Statistical Association, 111, 834-845, (2016)
[5] Bien, J.; Taylor, J.; Tibshirani, R., A lasso for hierarchical interactions, The Annals of Statistics, 41, 1111-1141, (2013) · Zbl 1292.62109
[6] Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J., Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends® in Machine Learning, 3, 1-122, (2011) · Zbl 1229.90122
[7] Chipman, H., Bayesian variable selection with related predictors, Canadian Journal of Statistics, 24, 17-36, (1996) · Zbl 0849.62032
[8] Choi, N. H.; Li, W.; Zhu, J., Variable selection with the strong heredity constraint and its oracle property, Journal of the American Statistical Association, 105, 354-364, (2010) · Zbl 1320.62171
[9] Cohen, J.; Cohen, P.; West, S. G.; Aiken, L. S., Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, (2013), Mahwah, NJ
[10] Davidson, E. H.; Erwin, D. H., Gene regulatory networks and the evolution of animal body plans, Science, 311, 796-800, (2006)
[11] Hamada, M.; Wu, C. J., Analysis of designed experiments with complex aliasing, Journal of Quality Technology, 24, 130-137, (1992)
[12] Hao, N.; Zhang, H. H., Interaction screening for ultra-high dimensional data, Journal of the American Statistical Association, 109, 1285-1301, (2014) · Zbl 1368.62193
[13] Hastie, T.; Tibshirani, R.; Friedman, J., The Elements of Statistical Learning, (2009), Springer-Verlag, New York · Zbl 1273.62005
[14] He, Y.; She, Y.; Wu, D., Stationary-sparse causality network learning, Journal of Machine Learning Research, 14, 3073-3104, (2013) · Zbl 1318.62174
[15] Jenatton, R.; Mairal, J.; Obozinski, G.; Bach, F., Proximal methods for hierarchical sparse coding, Journal of Machine Learning Research, 12, 2297-2334, (2011) · Zbl 1280.94029
[16] Lim, M.; Hastie, T., Learning interactions via hierarchical group-lasso regularization, Journal of Computational and Graphical Statistics, 24, 627-654, (2015)
[17] Lounici, K.; Pontil, M.; Van De Geer, S.; Tsybakov, A. B., Oracle inequalities and optimal inference under group sparsity, The Annals of Statistics, 39, 2164-2204, (2011) · Zbl 1306.62156
[18] McCullagh, P.; Nelder, J. A., Generalized Linear Models, (1989), Chapman and Hall, London · Zbl 0744.62098
[19] Negahban, S. N.; Ravikumar, P.; Wainwright, M. J.; Yu, B., A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers, Statistical Science, 27, 538-557, (2012) · Zbl 1331.62350
[20] Nelder, J. A., A reformulation of linear models, Journal of the Royal Statistical Society, 140, 48-77, (1977)
[21] Nesterov, Y., On an approach to the construction of optimal methods of minimization of smooth convex functions, Ekonom. i. Mat. Metody, 24, 509-517, (1988) · Zbl 0659.90068
[22] Owen, A. B., A robust hybrid of lasso and ridge regression, Contemporary Mathematics, 443, 59-72, (2007) · Zbl 1134.62047
[23] Pace, R. K.; Barry, B., Sparse spatial autoregressions, Statistics & Probability Letters, 33, 291-297, (1997) · Zbl 0901.62112
[24] Peixoto, J. L., Hierarchical variable selection in polynomial regression models, The American Statistician, 41, 311-313, (1987)
[25] A property of well-formulated polynomial regression models, The American Statistician, 44, 26-30, (1990)
[26] Radchenko, P.; James, G. M., Variable selection using adaptive nonlinear interaction structures in high dimensions, Journal of the American Statistical Association, 105, 1541-1553, (2010) · Zbl 1388.62212
[27] Ravikumar, P. D.; Liu, H.; Lafferty, J. D.; Wasserman, L. A., Advances in Neural Information Processing Systems, 20, Spam: sparse additive models, 1201-1208, (2007), MIT Press, Cambridge, MA
[28] Schmidt, M.; Roux, N. L.; Bach, F. R.; Shawe-Taylor, J.; Zemel, R.; Bartlett, P.; Pereira, F.; Weinberger, K., Advances in Neural Information Processing Systems, 24, Convergence rates of inexact proximal-gradient methods for convex optimization, 1458-1466, (2011), Curran Associates, Inc., Granada, Spain
[29] She, Y., An iterative algorithm for Fitting nonconvex penalized generalized linear models with grouped predictors, Computational Statistics & Data Analysis, 56, 2976-2990, (2012) · Zbl 1255.62209
[30] Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R., A sparse-group lasso, Journal of Computational and Graphical Statistics, 22, 231-245, (2013)
[31] Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, 58, 267-288, (1996) · Zbl 0850.62538
[32] van de Geer, S., Weakly decomposable regularization penalties and structured sparsity, Scandinavian Journal of Statistics, 41, 72-86, (2014) · Zbl 1349.62325
[33] Wei, F.; Huang, J., Consistent group selection in high-dimensional linear regression, Bernoulli, 16, 1369-1384, (2010) · Zbl 1207.62146
[34] Wu, J.; Devlin, B.; Ringquist, S.; Trucco, M.; Roeder, K., Screen and Clean: A tool for identifying interactions in genome-wide association studies, Genetic Epidemiology, 34, 275-285, (2010)
[35] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society, 68, 49-67, (2006) · Zbl 1141.62030
[36] Zhang, C.-H.; Huang, J., The sparsity and bias of the lasso selection in high-dimensional linear regression, The Annals of Statistics, 36, 1567-1594, (2008) · Zbl 1142.62044
[37] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society, 67, 301-320, (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.