×

A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables. (English) Zbl 1439.62166

Summary: We put forward a variable selection method for selecting explanatory variables in a normality-assumed multivariate linear regression. It is cumbersome to calculate variable selection criteria for all subsets of explanatory variables when the number of explanatory variables is large. Therefore, we propose a fast and consistent variable selection method based on a generalized \(C_p\) criterion. The consistency of the method is provided by a high-dimensional asymptotic framework such that the sample size and the sum of the dimensions of response vectors and explanatory vectors divided by the sample size tend to infinity and some positive constant which are less than one, respectively. Through numerical simulations, it is shown that the proposed method has a high probability of selecting the true subset of explanatory variables and is fast under a moderate sample size even when the number of dimensions is large.

MSC:

62J05 Linear regression; mixed models
62H12 Estimation in multivariate analysis
62E20 Asymptotic distribution theory in statistics

Software:

glmnet
PDFBibTeX XMLCite
Full Text: DOI Euclid

References:

[1] Atkinson, A. C. (1980). A note on the generalized information criterion for choice of a model., Biometrika 67 413-418. · Zbl 0455.62006 · doi:10.1093/biomet/67.2.413
[2] Chudik, A., Kapetanios, G. and Pesaran, M. H. (2018). A one covariate at a time, multiple testing approach to variable selection in high-dimensional linear regression models., Econometrika 86 1479-1512. · Zbl 1401.62107 · doi:10.3982/ECTA14176
[3] Fujikoshi, Y. and Satoh, K. (1997). Modified AIC and \(C_p\) in multivariate linear regression., Biometrika 84 707-716. · Zbl 0888.62055 · doi:10.1093/biomet/84.3.707
[4] Fujikoshi, Y., Sakurai, T. and Yanagihara, H. (2014). Consistency of high-dimensional AIC-type and \(C_p\)-type criteria in multivariate linear regression., J. Multivariate Anal. 123 184-200. · Zbl 1360.62265 · doi:10.1016/j.jmva.2013.09.006
[5] Fujikoshi, Y., Ulyanov, V. V. and Shimizu, R. (2010)., Multivariate Statistics: High-Dimensional and Large-Sample Approximations. John Wiley & Sons, Inc., Hoboken, New Jersey. · Zbl 1304.62016
[6] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent., J. Stat. Softw. 33(1) 1-22.
[7] Harville, D. A. (1997)., Matrix Algebra from a Statistician’s Perspective. Springer-Verlag, New York. · Zbl 0881.15001
[8] He, Y., Jiang, T., Wen, J. and Xu, G. (2018). Likelihood ratio test in multivariate linear regression: from low to high dimension., arXiv:1812.06894. · Zbl 1478.62194
[9] Lancaster, H. O. (1982). Chi-square distribution. In, Encyclopedia of Statistical Sciences, Vol. 1 (eds. S. Kotz & N. L. Johson), 439-442. John Wiley & Sons, New York.
[10] Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning., J. Amer. Statist. Assoc. 107 1129-1139. · Zbl 1443.62184 · doi:10.1080/01621459.2012.695654
[11] Li, Y., Nan, B. and Zhu, J. (2015). Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure., Biometrics 71 354-363. · Zbl 1390.62285 · doi:10.1111/biom.12292
[12] Luo, S. (2018). Variable selection in high-dimensional sparse multiresponse linear regression models., Stat. Pap. https://doi.org/10.1007/s00362-018-0989-x. · Zbl 1443.62201
[13] Mallows, C. L. (1973). Some comments on \(C_p\)., Technometrics 15 661-675. · Zbl 0269.62061
[14] Mallows, C. L. (1995). More comments on \(C_p\)., Technometrics 37 362-372. · Zbl 0862.62061
[15] Nagai, I., Yanagihara, H. and Satoh, K. (2012). Optimization of ridge parameters in multivariate generalized ridge regression by plug-in methods., Hiroshima Math. J. 42 301-324. · Zbl 1257.62081 · doi:10.32917/hmj/1355238371
[16] Nishii, R. (1988). Maximum likelihood principle and model selection when the true model is unspecified., J. Multivariate Anal. 27 392-403. · Zbl 0684.62026 · doi:10.1016/0047-259X(88)90137-6
[17] Nishii, R., Bai, Z. D. and Krishnaiah, P. R. (1988). Strong consistency of the information criterion for model selection in multivariate analysis., Hiroshima Math. J. 18 451-462. · Zbl 0678.62064 · doi:10.32917/hmj/1206129611
[18] Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D. Y., Pollack, J. R. and Wang, P. (2010). Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer., Ann. Appl. Stat. 4(1) 53-77. · Zbl 1189.62174 · doi:10.1214/09-AOAS271
[19] Rao, C. R. and Wu, Y. (1989). A strongly consistent procedure for model selection in a regression problem., Biometrika 76 369-374. · Zbl 0669.62051 · doi:10.1093/biomet/76.2.369
[20] Sparks, R. S., Coutsourides, D. and Troskie, L. (1983). The multivariate \(C_p\)., Comm. Statist. A - Theory Methods 12 1775-1793. · Zbl 0552.62041 · doi:10.1080/03610928308828569
[21] Srivastava, M. S. (2002)., Methods of Multivariate Statistics. John Wiley & Sons, New York. · Zbl 1006.62048
[22] Stuart, A. and Ord, J. K. (1994)., Kendall’s Advanced Theory of Statistics. Vol. 1. Distribution Theory \((6\) th ed.). Edward Arnold, London; distributed in the United States of America by Oxford University Press, New York. · Zbl 0880.62012
[23] Tiku, M. (1985). Noncentral chi-square distribution. In, Encyclopedia of Statistical Sciences, Vol. 6 (eds. S. Kotz & N. L. Johnson), 276-280, John Wiley & Sons, New York.
[24] Timm, N. H. (2002)., Applied Multivariate Analysis. Springer-Verlag, New York. · Zbl 1002.62036
[25] Wang, H. and Leng, C. (2008). A note on adaptive group lasso., Comput. Stat. Data An. 52 5277-5286. · Zbl 1452.62524 · doi:10.1016/j.csda.2008.05.006
[26] Xin, X., Hu, J. and Liu, L. (2017). On the oracle property of a generalized adaptive elastic-net for multivariate linear regression with a diverging number of parameters., J. Multivariate Anal. 162 16-32. · Zbl 1381.62108 · doi:10.1016/j.jmva.2017.08.005
[27] Yanagihara, H. (2016). A high-dimensionality-adjusted consistent \(C_p\)-type statistic for selecting variables in a normality-assumed linear regression with multiple responses., Procedia Comput. Sci. 96 1096-1105.
[28] Yanagihara, H., Wakaki, H. and Fujikoshi, Y. (2015). A consistency property of the AIC for multivariate linear models when the dimension and the sample size are large., Electron. J. Statist. 9 869-897. · Zbl 1328.62455 · doi:10.1214/15-EJS1022
[29] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables., J. Roy. Stat. Soc. B 68 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[30] Yuan, M., Ekici, A., Lu, Z. and Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression., J. Roy. Stat. Soc. B 69 329-346. · Zbl 07555355
[31] Zhao, L. · Zbl 0617.62055 · doi:10.1016/0047-259X(86)90017-5
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.