A lasso for hierarchical interactions. (English) Zbl 1292.62109

Summary: We add a set of convex constraints to the lasso to produce sparse interaction models that honor the hierarchy restriction that an interaction only be included in a model if one or both variables are marginally important. We give a precise characterization of the effect of this hierarchy constraint, prove that hierarchy holds with probability one and derive an unbiased estimate for the degrees of freedom of our estimator. A bound on this estimate reveals the amount of fitting “saved” by the hierarchy constraint.
We distinguish between parameter sparsity – the number of nonzero coefficients – and practical sparsity – the number of raw variables one must measure to make a new prediction. Hierarchy focuses on the latter, which is more closely tied to important data collection concerns such as cost, time and effort. We develop an algorithm, available in the R package hierNet, and perform an empirical study of our method.


62J07 Ridge regression; shrinkage estimators (Lasso)


R; glmnet; hierNet
Full Text: DOI arXiv Euclid


[1] Agresti, A. (2002). Categorical Data Analysis , 2nd ed. Wiley-Interscience, New York. · Zbl 1018.62002
[2] Bach, F. (2011). Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning 4 1-106. · Zbl 06064248
[3] Bach, F., Jenatton, R., Mairal, J., Obozinski, G. (2012). Structured sparsity through convex optimization. Statist. Sci. 27 450-468. · Zbl 1280.68179
[4] Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183-202. · Zbl 1175.94009
[5] Bickel, P., Ritov, Y. and Tsybakov, A. (2010). Hierarchical selection of variables in sparse high-dimensional regression. In Borrowing Strength : Theory Powering Applications-A Festschrift for Lawrence D. Brown. Inst. Math. Stat. Collect. 6 56-69. Inst. Math. Statist., Beachwood, OH.
[6] Bien, J., Taylor, J. and Tibshirani, R. (2013). Supplement to “A lasso for hierarchical interactions.” . · Zbl 1292.62109
[7] Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3 1-124. · Zbl 1229.90122
[8] Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37 373-384. · Zbl 0862.62059
[9] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees . Wadsworth Advanced Books and Software, Belmont, CA. · Zbl 0541.62042
[10] Chipman, H. (1996). Bayesian variable selection with related predictors. Canad. J. Statist. 24 17-36. · Zbl 0849.62032
[11] Choi, N. H., Li, W. and Zhu, J. (2010). Variable selection with the strong heredity constraint and its oracle property. J. Amer. Statist. Assoc. 105 354-364. · Zbl 1320.62171
[12] Cox, D. R. (1984). Interaction. Internat. Statist. Rev. 52 1-31. · Zbl 0562.62061
[13] Efron, B. (1986). How biased is the apparent error rate of a prediction rule? J. Amer. Statist. Assoc. 81 461-470. · Zbl 0621.62073
[14] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054
[15] Forina, M., Armanino, C., Lanteri, S. and Tiscornia, E. (1983). Classification of olive oils from their fatty acid composition. In Food Research and Data Analysis 189-214. Applied Science Publishers, London.
[16] Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1-141. · Zbl 0765.62064
[17] Friedman, J. H., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 1-22.
[18] George, E. and McCulloch, R. (1993). Variable selection via gibbs sampling. J. Amer. Statist. Assoc. 88 884-889.
[19] Hamada, M. and Wu, C. (1992). Analysis of designed experiments with complex aliasing. Journal of Quality Technology 24 130-137.
[20] Jenatton, R., Audibert, J.-Y. and Bach, F. (2011). Structured variable selection with sparsity-inducing norms. J. Mach. Learn. Res. 12 2777-2824. · Zbl 1280.68170
[21] Jenatton, R., Mairal, J., Obozinski, G. and Bach, F. (2010). Proximal methods for sparse hierarchical dictionary learning. In Proceedings of the International Conference on Machine Learning ( ICML ). · Zbl 1280.94029
[22] McCullagh, P. and Nelder, J. A. (1983). Generalized Linear Models . Chapman & Hall, London. · Zbl 0588.62104
[23] Nardi, Y. and Rinaldo, A. (2012). The log-linear group-lasso estimator and its asymptotic properties. Bernoulli 18 945-974. · Zbl 1243.62107
[24] Nelder, J. A. (1977). A reformulation of linear models. J. Roy. Statist. Soc. Ser. A 140 48-76.
[25] Nelder, J. A. (1997). Letters to the editors: Functional marginality is important. J. R. Stat. Soc. Ser. C. Appl. Stat. 46 281-286.
[26] Obozinski, G., Jacob, L. and Vert, J. (2011). Group lasso with overlaps: The latent group lasso approach. Available at . 1110.0413
[27] Park, M. and Hastie, T. (2008). Penalized logistic regression for detecting gene interactions. Biostatistics 9 30-50. · Zbl 1274.62853
[28] Peixoto, J. (1987). Hierarchical variable selection in polynomial regression models. Amer. Statist. 41 311-313.
[29] Radchenko, P. and James, G. M. (2010). Variable selection using adaptive nonlinear interaction structures in high dimensions. J. Amer. Statist. Assoc. 105 1541-1553. · Zbl 1388.62212
[30] Rhee, S., Taylor, J., Wadhera, G., Ben-Hur, A., Brutlag, D. and Shafer, R. (2006). Genotypic predictors of human immunodeficiency virus type 1 drug resistance. Proc. Natl. Acad. Sci. USA 103 17355.
[31] Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135-1151. · Zbl 0476.62035
[32] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[33] Tibshirani, R. J. and Taylor, J. (2011). The solution path of the generalized lasso. Ann. Statist. 39 1335-1371. · Zbl 1234.62107
[34] Tibshirani, R. J. and Taylor, J. (2012). Degrees of freedom in lasso problems. Ann. Statist. 40 1198-1232. · Zbl 1274.62469
[35] Tseng, P. (2001). Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109 475-494. · Zbl 1006.65062
[36] Turlach, B. (2004). Discussion of “Least angle regression.” Ann. Statist. 32 481-490. · Zbl 1091.62054
[37] Wu, J., Devlin, B., Ringquist, S., Trucco, M. and Roeder, K. (2010). Screen and clean: A tool for identifying interactions in genome-wide association studies. Genetic Epidemiology 34 275-285.
[38] Yuan, M., Joseph, V. R. and Lin, Y. (2007). An efficient variable selection approach for analyzing designed experiments. Technometrics 49 430-439.
[39] Yuan, M., Joseph, V. R. and Zou, H. (2009). Structured variable selection and estimation. Ann. Appl. Stat. 3 1738-1757. · Zbl 1184.62032
[40] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030
[41] Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468-3497. · Zbl 1369.62164
[42] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301-320. · Zbl 1069.62054
[43] Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist. 35 2173-2192. · Zbl 1126.62061
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.