##
**Sure independence screening in generalized linear models with NP-dimensionality.**
*(English)*
Zbl 1206.68157

Summary: Ultrahigh-dimensional variable selection plays an increasingly important role in contemporary scientific discoveries and statistical research. Among others, J. Fan and J. Lv [J. R. Stat. Soc., Ser. B, Stat. Methodol. 70, No. 5, 849–911 (2008)] propose an independent screening framework by ranking the marginal correlations. They showed that the correlation ranking procedure possesses a sure independence screening property within the context of the linear model with Gaussian covariates and responses.

In this paper, we propose a more general version of the independent learning with ranking the maximum marginal likelihood estimates or the maximum marginal likelihood itself in generalized linear models. We show that the proposed methods, with [loc. cit.] as a very special case, also possess the sure screening property with vanishing false selection rate. The conditions under which the independence learning possesses a sure screening is surprisingly simple. This justifies the applicability of such a simple method to a wide spectrum. We quantify explicitly the extent to which the dimensionality can be reduced by independence screening, which depends on the interactions of the covariance matrix of covariates and true parameters. Simulation studies are used to illustrate the utility of the proposed approaches. In addition, we establish an exponential inequality for the quasi-maximum likelihood estimator which is useful for high-dimensional statistical learning.

In this paper, we propose a more general version of the independent learning with ranking the maximum marginal likelihood estimates or the maximum marginal likelihood itself in generalized linear models. We show that the proposed methods, with [loc. cit.] as a very special case, also possess the sure screening property with vanishing false selection rate. The conditions under which the independence learning possesses a sure screening is surprisingly simple. This justifies the applicability of such a simple method to a wide spectrum. We quantify explicitly the extent to which the dimensionality can be reduced by independence screening, which depends on the interactions of the covariance matrix of covariates and true parameters. Simulation studies are used to illustrate the utility of the proposed approaches. In addition, we establish an exponential inequality for the quasi-maximum likelihood estimator which is useful for high-dimensional statistical learning.

### MSC:

68Q32 | Computational learning theory |

60F10 | Large deviations |

62J12 | Generalized linear models (logistic models) |

62E99 | Statistical distribution theory |

### Keywords:

generalized linear models; independent learning; sure independent screening; variable selection
PDF
BibTeX
XML
Cite

\textit{J. Fan} and \textit{R. Song}, Ann. Stat. 38, No. 6, 3567--3604 (2010; Zbl 1206.68157)

### References:

[1] | Bickel, P. J. and Doksum, K. A. (1981). An analysis of transformations revisited. J. Amer. Statist. Assoc. 76 296-311. JSTOR: · Zbl 0464.62058 |

[2] | Bickel, P. J. and Doksum, K. A. (2001). Mathematical Statistics: Basic Ideas and Selected Topics , 2nd ed. Prentice Hall, Upper Saddle River, NJ. · Zbl 0403.62001 |

[3] | Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations. J. R. Stat. Soc. Ser. B 26 211-246. JSTOR: · Zbl 0156.40104 |

[4] | Candes, E. and Tao, T. (2007). The dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313-2404. · Zbl 1139.62019 |

[5] | Cox, D. R. (1972). Regression models and life-tables (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 34 187-220. JSTOR: · Zbl 0243.62041 |

[6] | Fahrmeir, L. and Kaufmann, H. (1985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann. Statist. 13 342-368. · Zbl 0594.62058 |

[7] | Fan, J. and Fan, Y. (2008). High-dimensional classification using features annealed independence rules. Ann. Statist. 36 2605-2637. · Zbl 1360.62327 |

[8] | Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. JSTOR: · Zbl 1073.62547 |

[9] | Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849-911. |

[10] | Fan, J., Samworth, R. and Wu, Y. (2009). Ultra-dimensional variable selection via independent learning: Beyond the linear model. J. Mach. Learn. Res. 10 1829-1853. |

[11] | Frank, I. E. and Friedman, J. H. (1993). Astatistical view of some chemometrics regression tools (with discussion). Technometrics 35 109-148. · Zbl 0775.62288 |

[12] | Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc. 76 817-823. JSTOR: |

[13] | Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear preditor selection and the virtue of overparametrization. Bernoulli 10 971-988. · Zbl 1055.62078 |

[14] | Hall, P. and Miller, H. (2009). Using generalised correlation to effect variable selection in very high dimensional problems. J. Comput. Graph. Statist. 18 533. |

[15] | Hall, P., Titterington, D. M. and Xue, J.-H. (2009). Tilting methods for assessing the influence of components in a classifier. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 783-803. · Zbl 1248.62102 |

[16] | Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13-30. JSTOR: · Zbl 0127.10602 |

[17] | Huang, J., Horowitz, J. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Statist. 36 587-613. · Zbl 1133.62048 |

[18] | Kosorok, M. R., Lee, B. L. and Fine, J. P. (2004). Robust inference for univariate proportional hazards frailty regression models. Ann. Statist. 32 1448-1491. · Zbl 1047.62090 |

[19] | Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes . Springer, Berlin. · Zbl 0748.60004 |

[20] | Massart, P. (2000). About the constants in talagrands concentration inequalities for empirical processes. Ann. Probab. 28 863-884. · Zbl 1140.60310 |

[21] | Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267-288. JSTOR: · Zbl 0850.62538 |

[22] | van de Geer, S. (2002). M-estimation using penalties or sieves. J. Statist. Plann. Inference 108 55-69. · Zbl 1030.62026 |

[23] | van de Geer, S. (2008). High-dimensional generalized linear modelsand the Lasso. Ann. Statist. 36 614-645. · Zbl 1138.62323 |

[24] | van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes . Springer, New York. · Zbl 0862.60002 |

[25] | White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50 1-26. JSTOR: · Zbl 0478.62088 |

[26] | Zeng, D. and Lin, D. Y. (2007). Maximum likelihood estimation in semiparametric regression models with censored data. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 507-564. |

[27] | Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326 |

[28] | Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509-1533. · Zbl 1142.62027 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.