×

Simultaneous dimension reduction and variable selection in modeling high dimensional data. (English) Zbl 1464.62113

Summary: High dimensional predictors in regression analysis are often associated with multicollinearity along with other estimation problems. These problems can be mitigated through a constrained optimization method that simultaneously induces dimension reduction and variable selection that also maintains a high level of predictive ability of the fitted model. Simulation studies show that the method may outperform sparse principal component regression, least absolute shrinkage and selection operator, and elastic net procedures in terms of predictive ability and optimal selection of inputs. Furthermore, the method yields reduced models with smaller prediction errors than the estimated full models from the principal component regression or the principal covariance regression.

MSC:

62-08 Computational methods for problems pertaining to statistics
62J05 Linear regression; mixed models
62H25 Factor analysis and principal components; correspondence analysis

Software:

BartPy; BayesTree; AS 223
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Candes, E.; Tao, T., The Dantzig selector: statistical estimation when p is much larger than n, Ann. Statist., 35, 6, 2313-2351, (2007) · Zbl 1139.62019
[2] Chatterjee, S.; Hadi, A., (Regression Analysis by Example, Wiley Series in Probability and Statistics, (2006), John Wiley and Sons, Inc. Hoboken, New Jersey) · Zbl 1250.62035
[3] Chen, L.; Huang, J., Sparse reduced-rank regression for simultaneous dimension reduction and variable selection, J. Amer. Statist. Assoc., 107, 500, 1533-1545, (2012) · Zbl 1258.62075
[4] Chipman, H.; George, E.; McCulloch, R., BART: Bayesian additive regression trees, Ann. Appl. Stat., 4, 1, 266-298, (2010) · Zbl 1189.62066
[5] Chipman, H.; Gu, G., Interpretable dimension reduction, J. Appl. Stat., 32, 9, 969-987, (2005) · Zbl 1121.62347
[6] Chun, H.; Keles, S., Sparse partial least squares regression for simultaneous dimension reduction and variable selection, J. R. Stat. Soc. Ser. B Stat. Methodol., 72, 1, 3-25, (2010)
[7] Cook, R. D., Fisher lecture: dimension reduction in regression (with discussion), Statist. Sci., 22, 1-43, (2007) · Zbl 1246.62149
[8] De Jong, S.; Kiers, H. A.L., Principal covariates regression, Chemometr. Intell. Lab. Syst., 14, 155-164, (1992)
[9] Draper, N.; Smith, H., (Applied Regression Analysis, Wiley Series in Probability and Statistics, (1998), John Wiley and Sons, Inc. Hoboken, New Jersey) · Zbl 0158.17101
[10] Eckart, C.; Young, G., The approximation of one matrix by another of lower rank, Psychometrika, 1, 3, 211-218, (1936) · JFM 62.1075.02
[11] Filzmoser, P.; Croux, C., A projection algorithm for regression with collinearity, (Jajuga, K.; Sokolowski, A.; Bock, H.-H., Classification, Clustering, and Data Analysis, (2002), Springer-Verlag Berlin), 227-234
[12] Foucart, T., A decision rule for discarding principal components in regression, J. Statist. Plann. Inference, 89, 1, 187-195, (2000) · Zbl 0954.62081
[13] Garson, G. D., Multiple regression, (2012), Statistical Associates Publishers Asheboro, NC
[14] George, E. I.; Oman, S. D., Multiple-shrinkage principal component regression, Statistician, 45, 1, 111-124, (1996)
[15] Goldenshluger, A.; Tsybakov, A., Adaptive prediction and estimation in linear regression with infinitely many parameters, Ann. Statist., 29, 6, 1601-1619, (2001) · Zbl 1043.62076
[16] Hoerl, A. E.; Kennard, R. W., Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12, 55-82, (1970) · Zbl 0202.17205
[17] Hwang, J.; Nettleton, D., Principal components regression with data-chosen components and related methods, Technometrics, 45, 70-79, (2003)
[18] Izenman, A. J., Reduced-rank regression for the multivariate linear model, J. Multivariate Anal., 5, 248-264, (1975) · Zbl 0313.62042
[19] Jolliffe, I., A note on the use of principal components in regression, J. Appl. Stat., 31, 3, 300-303, (1982)
[20] Klinger, A., Inference in high dimensional generalized linear models based on soft thresholding, J. Roy. Statist. Soc., 63, 2, 377-392, (2001) · Zbl 0980.62053
[21] Kosfeld, R.; Lauridsen, J., Factor analysis regression, Statist. Papers, 49, 4, 653-667, (2008) · Zbl 1312.62079
[22] Lee, T. S., Algorithm AS 223: optimum ridge parameter selection, J. Roy. Statist. Soc. Ser. C, 36, 1, 112-118, (1987) · Zbl 0613.62092
[23] McDonald, G. C.; Galarneau, D. I., J. Amer. Statist. Assoc., 70, 550, 407-416, (1975)
[24] Ravikumar, P.; Liu, H.; Lafferty, J.; Wasserman, L., Spam: sparse additive models, (Platt, J.; Koller, D.; Singer, Y.; Roweis, S., Advances in Neural Information Processing Systems, Vol. 20, (2007), MIT Press Cambridge, MA), 1201-1208
[25] Reinsel, G. C.; Velu, P. R., Multivariate reduced-rank regression: theory and applications, (1998), Springer New York · Zbl 0909.62066
[26] Schwarz, G., Estimating the dimension of a model, Ann. Statist., 6, 2, 461-464, (1978) · Zbl 0379.62005
[27] Tibshirani, R., Regression shrinkage and selection via the LASSO, J. R. Stat. Soc. Ser. B Stat. Methodol., 58, 1, 267-288, (1996) · Zbl 0850.62538
[28] WHO, 2016. WHO Website available at: http://www.who.int/gho/en/.
[29] Wold, H., Estimation of principal components and related models by iterative least squares, (Krishnaiaah, P. R., Multivariate Analysis, (1966), Academic Press New York), 391-420 · Zbl 0214.46103
[30] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol., 67, 2, 301-320, (2005) · Zbl 1069.62054
[31] Zou, H.; Hastie, T.; Tibshirani, R., Sparse principal component analysis, J. Comput. Graph. Statist., 15, 2, 265-286, (2006)
[32] Zou, H.; Hastie, T.; Tibshirani, R., On the “degrees of freedom” of the LASSO, Ann. Statist., 35, 5, 2173-2192, (2007) · Zbl 1126.62061
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.