## Extended differential geometric LARS for high-dimensional GLMs with general dispersion parameter.(English)Zbl 1384.62270

Summary: A large class of modeling and prediction problems involves outcomes that belong to an exponential family distribution. Generalized linear models (GLMs) are a standard way of dealing with such situations. Even in high-dimensional feature spaces GLMs can be extended to deal with such situations. Penalized inference approaches, such as the $$\ell_1$$ or SCAD, or extensions of least angle regression, such as dgLARS, have been proposed to deal with GLMs with high-dimensional feature spaces. Although the theory underlying these methods is in principle generic, the implementation has remained restricted to dispersion-free models, such as the Poisson and logistic regression models. The aim of this manuscript is to extend the differential geometric least angle regression method for high-dimensional GLMs to arbitrary exponential dispersion family distributions with arbitrary link functions. This entails, first, extending the predictor-corrector (PC) algorithm to arbitrary distributions and link functions, and second, proposing an efficient estimator of the dispersion parameter. Furthermore, improvements to the computational algorithm lead to an important speed-up of the PC algorithm. Simulations provide supportive evidence concerning the proposed efficient algorithms for estimating coefficients and dispersion parameter. The resulting method has been implemented in our R package (which will be merged with the original dglars package) and is shown to be an effective method for inference for arbitrary classes of GLMs.

### MSC:

 62J12 Generalized linear models (logistic models)

### Software:

glmpath; ElemStatLearn; lars; gamair; spikeslab; R; dglars; SAS; glmnet
Full Text:

### References:

 [1] Aho, K; Derryberry, D; Peterson, T, Model selection for ecologists: the worldviews of AIC and BIC, Ecology, 95, 631-636, (2014) [2] Akaike, H, A new look at the statistical model identification, IEEE Trans. Autom. Control, 19, 716-723, (1974) · Zbl 0314.62039 [3] Allgower, E., Georg, K.: Introduction to Numerical Continuation Methods. Society for Industrial and Applied Mathematics, New York (2003) · Zbl 1036.65047 [4] Arlot, S; Celisse, A, A survey of cross-validation procedures for model selection, Stat. Surv., 4, 40-79, (2010) · Zbl 1190.62080 [5] Augugliaro, L; Mineo, AM; Wit, EC, Differential geometric least angle regression: a differential geometric approach to sparse generalized linear models, J. R. Stat. Soc. B, 75, 471-498, (2013) · Zbl 1411.62214 [6] Augugliaro, L; Mineo, AM; Wit, EC, Dglars: an R package to estimate sparse generalized linear models, J. Stat. Softw., 59, 1-40, (2014) [7] Augugliaro, L.: dglars: Differential Geometric LARS (dgLARS) Method. R package version 1.0.5. http://CRAN.R-project.org/package=dglars (2014b) · Zbl 0042.38403 [8] Augugliaro, L; Mineo, AM; Wit, EC, A differential geometric approach to generalized linear models with grouped predictors, Biometrika, 103, 563-593, (2016) · Zbl 07072138 [9] Augugliaro, L., Pazira, H.: dglars: Differential Geometric Least Angle Regression. R package version 2.0.0. http://CRAN.R-project.org/package=dglars (2017) [10] Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, $$2^{{\rm nd}}$$ edn. Springer, New York (2002) [11] Candes, EJ; Tao, T, The Dantzig selector: statistical estimation when $$p$$ is much larger than $$n$$, Ann. Stat., 35, 2313-2351, (2007) · Zbl 1139.62019 [12] Chen, Y; Du, P; Wang, Y, Variable selection in linear models, Wiley Interdiscip. Rev. Comput. Stat., 6, 1-9, (2014) [13] Cordeiro, GM; McCullagh, P, Bias correction in generalized linear models, J. R. Stat. Soc. B, 53, 629-643, (1991) · Zbl 0800.62432 [14] Efron, B; Hastie, T; Johnstone, I; Tibshirani, R, Least angle regression, Ann. Stat., 32, 407-499, (2004) · Zbl 1091.62054 [15] Fan, J; Li, R, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., 96, 1348-1360, (2001) · Zbl 1073.62547 [16] Fan, J; Lv, J, Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. B, 70, 849-911, (2008) · Zbl 1411.62187 [17] Fan, J; Guo, S; Hao, N, Variance estimation using refitted cross-validation in ultrahigh dimensional regression, J. R. Stat. Soc. B, 74, 37-65, (2012) · Zbl 1411.62199 [18] Farrington, CP, On assessing goodness of fit of generalized linear model to sparse data, J. R. Stat. Soc. B, 58, 349-360, (1996) · Zbl 0866.62040 [19] Friedman, J., Hastie, T., RTibshirani: glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models. R Package Version 1.1-5. http://CRAN.R-project.org/package=glmnet (2010b) [20] Hastie, T., Efron, B.: lars: Least Angle Regression, Lasso and Forward Stagewise. R Package Version 1.2. http://CRAN.R-project.org/package=lars (2013) [21] Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009) · Zbl 1273.62005 [22] Hoerl, AE; Kennard, R, Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12, 55-67, (1970) · Zbl 0202.17205 [23] Ishwaran, H; Kogalur, UB; Rao, J, Spikeslab: prediction and variable selection using spike and slab regression, R J., 2, 68-73, (2010) [24] Ishwaran, H., Kogalur, U.B., Rao, J.: spikeslab: prediction and variable selection using spike and slab regression. R package version 1.1.2. http://CRAN.R-project.org/package=spikeslab (2010b) · Zbl 1073.62547 [25] James, G; Radchenko, P, A generalized Dantzig selector with shrinkage tuning, Biometrika, 96, 323-337, (2009) · Zbl 1163.62054 [26] Jorgensen, B, Exponential dispersion models, J. R. Stat. Soc. B, 49, 127-162, (1987) · Zbl 0662.62078 [27] Jorgensen, B.: The Theory of Dispersion Models. Chapman & Hall, London (1997) · Zbl 0928.62052 [28] Kullback, S; Leibler, RA, On information and sufficiency, Ann. Math. Stat., 22, 79-86, (1951) · Zbl 0042.38403 [29] Li, K.C.: Asymptotic optimality for $$c_p$$, $$c_l$$, cross-validation and generalized cross-validation: discrete index set. Ann. Stat. 15, 958-975 (1987) · Zbl 0653.62037 [30] Littell, R.C., Stroup, W.W., Feund, R.J.: SAS for Linear Models, 4th edn. Sas Institute Inc., Cary (2002) [31] McCullagh, P., Nelder, J.A.: Generalized Liner Models. Chapman & Hall, London (1989) · Zbl 0744.62098 [32] McQuarrie, A.D.R., Tsai, C.L.: Regression and Time Series Model Selection, 1st edn. World Scientific Publishing Co. Pte. Ltd, Singapore (1998) · Zbl 0907.62095 [33] Meng, R.: Estimation of dispersion parameters in glms with and without random effects. Master’s thesis, Stockholm University (2004) · Zbl 0314.62039 [34] Park, M.Y., Hastie, T.: glmpath: $$L_1$$ Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model. R Package Version 0.94. http://CRAN.R-project.org/package=glmpath (2007b) · Zbl 1091.62054 [35] Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes in Fortran 77: The Art of Scientific Computing, 2nd edn. Cambridge University Press, England (1992) · Zbl 0778.65002 [36] Schwarz, G, Estimating the dimension of a model, Ann. Stat., 6, 461-464, (1978) · Zbl 0379.62005 [37] Shao, J, An asymptotic theory for linear model selection, Stat. Sin., 7, 221-264, (1997) · Zbl 1003.62527 [38] Shibata, R, An optimal selection of regression variables, Biometrika, 68, 45-54, (1981) · Zbl 0464.62054 [39] Shibata, R, Approximation efficiency of a selection procedure for the number of regression variables, Biometrika, 71, 43-49, (1984) · Zbl 0543.62053 [40] Stone, M, Asymptotics for and against cross-validation, Biometrika, 64, 29-35, (1977) · Zbl 0368.62046 [41] Tibshirani, R, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B, 58, 267-288, (1996) · Zbl 0850.62538 [42] Ultricht, J., Tutz, G.: Combining quadratic penalization and variable selection via forward boosting. Tech. Rep., Department of Statistics, Munich University, Technical Reports No. 99 (2011) [43] Vos, PW, A geometric approach to detecting influential cases, Ann. Stat., 19, 1570-1581, (1991) · Zbl 0741.62067 [44] Whittaker, E.T., Robinson, G.: The Calculus of Observations: An Introduction to Numerical Analysis, 4th edn. Dover Publications, New York (1967) [45] Wood, S.N.: Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC, Boca Raton (2006) · Zbl 1087.62082 [46] Zhang, CH, Nearly unbiased variable selection under minimax concave penalty, Ann. Stat., 38, 894-942, (2010) · Zbl 1183.62120 [47] Zou, H; Hastie, T, Regularization and variable selection via the elastic net, J. R. Stat. Soc. B, 67, 301-320, (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.