×

Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. (English) Zbl 1295.62053

Ann. Stat. 41, No. 1, 342-369 (2013); correction ibid. 41, No. 5, 2699 (2013).
Analyzing ultra-high-dimensional data with the number of features increasing at an exponential rate is a challenging problem. The first step in this data analysis problem is to use a fast screening procedure to reduce the dimension to a moderate scale. The authors propose a quantile-adaptive model-free screening procedure for this purpose. The approach allows the set of active variables to be different when modelling different conditional quantiles and is effective for analyzing high-dimensional data characterized by heteroscedasticity. The authors use a technique of estimating marginal quantile regression nonparametrically by means of B-spline approximations.

MSC:

62G99 Nonparametric inference
62G08 Nonparametric regression and quantile regression
62N05 Reliability and life testing
62-07 Data analysis (statistics) (MSC2010)
PDF BibTeX XML Cite
Full Text: DOI arXiv Euclid

References:

[1] Bair, E. and Tibshirani, R. (2004). Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2 511-522.
[2] Beran, R. (1981). Nonparametric regression with randomly censored survival data, Technical report. Univ. California, Berkeley.
[3] Bühlmann, P., Kalisch, M. and Maathuis, M. H. (2010). Variable selection in high-dimensional linear models: Partially faithful distributions and the PC-simple algorithm. Biometrika 97 261-278. · Zbl 1233.62135
[4] Fan, J., Feng, Y. and Wu, Y. (2010). Ultrahigh dimensional variable selection for Cox’s proportional hazards model. IMS Collections 6 70-86.
[5] Fan, J., Feng, Y. and Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Amer. Statist. Assoc. 106 544-557. · Zbl 1232.62064
[6] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547
[7] Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high dimensional feature space (with discussion). J. Roy. Statist. Soc. Ser. B 70 849-911.
[8] Fan, J., Samworth, R. and Wu, Y. (2009). Ultrahigh dimensional variable selection: Beyond the linear model. J. Mach. Learn. Res. 10 1829-1853. · Zbl 1235.62089
[9] Fan, J. and Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. Ann. Statist. 38 3567-3604. · Zbl 1206.68157
[10] Gonzalez-Manteiga, W. and Cadarso-Suarez, C. (1994). Asymptotic properties of a generalized Kaplan-Meier estimator with some applications. J. Nonparametr. Stat. 4 65-78. · Zbl 1383.62142
[11] Hall, P. and Miller, H. (2009). Using generalized correlation to effect variable selection in very high dimensional problems. J. Comput. Graph. Statist. 18 533-550.
[12] He, X. and Shi, P. (1996). Bivariate tensor-product \(B\)-splines in a partly linear model. J. Multivariate Anal. 58 162-181. · Zbl 0865.62027
[13] He, X., Wang, L. and Hong, H. G. (2013). Supplement to “Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data.” .
[14] Hjort, N. L. and Pollard, D. (1993). Asymptotics for minimisers of convex processes. Technical report, Dept. Statistics, Yale Univ., New Haven, CT. Available at .
[15] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13-30. · Zbl 0127.10602
[16] Knight, K. (1998). Limiting distributions for \(L_1\) regression estimators under general conditions. Ann. Statist. 26 755-770. · Zbl 0929.62021
[17] Koenker, R. (2005). Quantile Regression. Econometric Society Monographs 38 . Cambridge Univ. Press, Cambridge. · Zbl 1111.62037
[18] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces : Isoperimetry and Processes. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [ Results in Mathematics and Related Areas (3)] 23 . Springer, Berlin. · Zbl 0748.60004
[19] Li, H. and Luan, Y. (2005). Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data. Bioinformatics 21 2403-2409. · Zbl 1022.68519
[20] Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning. J. Amer. Statist. Assoc. 107 1129-1139. · Zbl 1443.62184
[21] Lo, S.-H. and Singh, K. (1986). The product-limit estimator and the bootstrap: Some asymptotic representations. Probab. Theory Related Fields 71 455-465. · Zbl 0561.62032
[22] Massart, P. (2000). Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse Math. (6) 9 245-303. · Zbl 0986.62002
[23] McKeague, I. W., Subramanian, S. and Sun, Y. (2001). Median regression and the missing information principle. J. Nonparametr. Stat. 13 709-727. · Zbl 1009.62089
[24] Peng, L. and Huang, Y. (2008). Survival analysis with quantile regression models. J. Amer. Statist. Assoc. 103 637-649. · Zbl 1408.62159
[25] Portnoy, S. (2003). Censored regression quantiles. J. Amer. Statist. Assoc. 98 1001-1012. · Zbl 1045.62099
[26] Rosenwald, A., Wright, G., Chan, W. C., Connors, J. M., Hermelink, H. K., Smeland, E. B. and Staudt, L. M. (2002). The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. The New England Journal of Medicine 346 1937-1947.
[27] Stone, C. J. (1985). Additive regression and other nonparametric models. Ann. Statist. 13 689-705. · Zbl 0605.62065
[28] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes . Springer, New York. · Zbl 0862.60002
[29] Wang, H. J. and Wang, L. (2009). Locally weighted censored quantile regression. J. Amer. Statist. Assoc. 104 1117-1128. · Zbl 1388.62289
[30] Ying, Z., Jung, S. H. and Wei, L. J. (1995). Survival analysis with median regression models. J. Amer. Statist. Assoc. 90 178-184. · Zbl 0818.62103
[31] Zhao, S. D. and Li, Y. (2012). Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J. Multivariate Anal. 105 397-411. · Zbl 1233.62173
[32] Zhou, S., Shen, X. and Wolfe, D. A. (1998). Local asymptotics for regression splines and confidence regions. Ann. Statist. 26 1760-1782. · Zbl 0929.62052
[33] Zhu, L.-P., Li, L., Li, R. and Zhu, L.-X. (2011). Model-free feature screening for ultrahigh-dimensional data. J. Amer. Statist. Assoc. 106 1464-1475. · Zbl 1233.62195
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.