Penalized interaction estimation for ultrahigh dimensional quadratic regression. (English) Zbl 1478.62189

This paper develops a novel estimation method for ultrahigh-dimensional quadratic regression that allows for estimating the main effects and the interactions separately. The method has explicit formulae for both the main effects and the interactions, derived under moment conditions on the ultrahigh-dimensional covariates that are satisfied if normality is assumed. It does not require heredity assumptions. Estimating the main effects through a separate working linear model ensures that the resulting estimate satisfies the invariance principle at the population level. The estimation of the interactions is robust to the estimation of the main effects. The interactions are estimated in matrix form under a penalized convex loss function, which yields a sparse solution. The resulting estimates are shown to be consistent, even when the covariate dimension is an exponential order of the sample size. An alternating direction method of multipliers algorithm is developed to implement the penalized estimation. The method is evaluated by its performance in simulations and in an application to a real-world case using a Portuguese wine data set.


62J02 General nonlinear regression
62J07 Ridge regression; shrinkage estimators (Lasso)
62P30 Applications of statistics in engineering and industry; control charts
Full Text: DOI arXiv


[1] Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statistica Sinica 6, 311-329. · Zbl 0848.62030
[2] Bien, J., Simon, N. and Tibshirani, R. (2015). Convex hierarchical testing of interactions. The Annals of Applied Statistics 9, 27-42. · Zbl 1454.62311
[3] Bien, J., Taylor, J. and Tibshirani, R. (2013). A Lasso for hierarchical interactions. The Annals of Statistics 41, 1111-1141. · Zbl 1292.62109
[4] Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3, 1-122. · Zbl 1229.90122
[5] Chen, L., Sun, D. and Toh, K.-C. (2017). A note on the convergence of admm for lin-early constrained convex optimization problems. Computational Optimization and Appli-cations 66, 327-343. · Zbl 1367.90083
[6] Chen, S., Zhang, L. and Zhong, P. (2010). Tests for high-dimensional covariance matrices. Journal of the American Statistical Association 105, 810-819. · Zbl 1321.62086
[7] Cheng, Q. and Zhu, L. (2017). On relative efficiency of principal hessian directions. Statistics & Probability Letters 126, 108-113. · Zbl 1381.62142
[8] Choi, N. H., Li, W. and Zhu, J. (2010). Variable selection with the strong heredity constraint and its oracle property. Journal of the American Statistical Association 105, 354-364. · Zbl 1320.62171
[9] Cordell, H. J. (2009). Detecting gene-gene interactions that underlie human diseases. Nature Reviews. Genetics 10, 392.
[10] Cortez, P., Cerdeira, A., Almeida, F., Matos, T. and Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47, 547-553.
[11] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. The Annals of Statistics 32, 407-499. · Zbl 1091.62054
[12] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348-1360. · Zbl 1073.62547
[13] Fan, Y., Kong, Y., Li, D. and Zheng, Z. (2015). Innovated interaction screening for high-dimensional nonlinear classification. The Annals of Statistics 43, 1243-1272. · Zbl 1328.62383
[14] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33, 1-22.
[15] Hamada, M. and Wu, C. J. (1992). Analysis of designed experiments with complex aliasing. Journal of Quality Technology 24, 130-137.
[16] Hao, N., Feng, Y. and Zhang, H. H. (2018). Model selection for high-dimensional quadratic regression via regularization. Journal of the American Statistical Association 113, 615-625. · Zbl 1398.62176
[17] Hao, N. and Zhang, H. H. (2014). Interaction screening for ultrahigh-dimensional data. Journal of the American Statistical Association 109, 1285-1301. · Zbl 1368.62193
[18] Hao, N. and Zhang, H. H. (2017). A note on high-dimensional linear regression with interactions. The American Statistician 71, 291-297.
[19] Haris, A., Witten, D. and Simon, N. (2016). Convex modeling of interactions with strong hered-ity. Journal of Computational and Graphical Statistics 25, 981-1004.
[20] Hong, M. and Luo, Z.-Q. (2017). On the linear convergence of the alternating direction method of multipliers. Mathematical Programming 162, 165-199. · Zbl 1362.90313
[21] Jiang, B. and Liu, J. S. (2014). Variable selection for general index models via sliced inverse regression. The Annals of Statistics 42, 1751-1786. · Zbl 1305.62234
[22] Kong, Y., Li, D., Fan, Y. and Lv, J. (2017). Interaction pursuit in high-dimensional multi-response regression via distance correlation. The Annals of Statistics 45, 897-922. · Zbl 1368.62140
[23] Li, K.-C. (1992). On principal Hessian directions for data visualization and dimension reduc-tion: Another application of Stein’s lemma. Journal of the American Statistical Associa-tion 87, 1025-1039. · Zbl 0765.62003
[24] Lim, M. and Hastie, T. (2015). Learning interactions via hierarchical group-Lasso regularization. Journal of Computational and Graphical Statistics 24, 627-654.
[25] Liu, W. and Luo, X. (2015). Fast and adaptive sparse precision matrix estimation in high dimensions. Journal of Multivariate Analysis 135, 153-162. · Zbl 1307.62148
[26] Nelder, J. A. (1977). A reformulation of linear models. Journal of the Royal Statistical Society, Series A (General) 140, 48-77.
[27] Nishihara, R., Lessard, L., Recht, B., Packard, A. and Jordan, M. I. (2015). A general analysis of the convergence of admm. arXiv preprint arXiv:1502.02009.
[28] Radchenko, P. and James, G. (2010). Variable selection using adaptive nonlinear interaction structures in high dimensions. Journal of the American Statistical Association 105, 1541-1553. · Zbl 1388.62212
[29] Ravikumar, P., Wainwright, M., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing 1-penalized log-determinant divergence. Electronic Journal of Statistics 5, 935-980. · Zbl 1274.62190
[30] Ritchie, M. D., Hahn, L. W., Roodi, N., Bailey, L. R., Dupont, W. D., Parl, F. F. et al. (2001). Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. The American Journal of Human Genetics 69, 138-147.
[31] Simon, N. and Tibshirani, R. (2015). A permutation approach to testing interactions for binary response by comparing correlations between classes. Journal of the American Statistical Association 110, 1707-1716. · Zbl 1373.62278
[32] Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. The Annals of Statistics 9, 1135-1151. · Zbl 0476.62035
[33] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B (Methodological) 58, 267-288. · Zbl 0850.62538
[34] Yuan, M., Joseph, R. and Zou, H. (2009). Structured variable selection and estimation. The Annals of Applied Statistics 3, 1738-1757. · Zbl 1184.62032
[35] Zhang, T. and Zou, H. (2014). Sparse precision matrix estimation via Lasso penalized D-trace loss. Biometrika 101, 103-120. · Zbl 1285.62063
[36] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. Journal of Machine Learning Research 7, 2541-2563. · Zbl 1222.62008
[37] Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association 101, 1418-1429. · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.