zbMATH — the first resource for mathematics

Variable selection for high dimensional Gaussian copula regression model: an adaptive hypothesis testing procedure. (English) Zbl 06920884
Summary: In this paper we consider the variable selection problem for high dimensional Gaussian copula regression model. We transform the variable selection problem into a multiple testing problem. Compared to the existing methods depending on regularization or a stepwise algorithm, our method avoids the ambiguous relationship between the regularized parameter and the number of false discovered variables or the decision of a stopping rule. We exploit nonparametric rank-based correlation coefficient estimators to construct our test statistics which achieve robustness and adaptivity to the unknown monotone marginal transformations. We show that our multiple testing procedure can control the false discovery rate (FDR) or the average number of falsely discovered variables (FDV) asymptotically. We also propose a screening multiple testing procedure to deal with the extremely high dimensional setting. Besides theoretical analysis, we also conduct numerical simulations to compare the variable selection performance of our method with some state-of-the-art methods. The proposed method is also applied on a communities and crime unnormalized data set to illustrate its empirical usefulness.

62 Statistics
CoSaMP; gcmr; hgam
Full Text: DOI
[1] Anderson, T. W., An introduction to multivariate statistical analysis, (2003), Wiley-Interscience · Zbl 1039.62044
[2] Bickel, P. J.; Ritov, Y.; Tsybakov, A. B., Simultaneous analysis of lasso and Dantzig selector, Ann. Statist., 37, 1705-1732, (2009) · Zbl 1173.62022
[3] Buczak, A. L.; Gifford, C. M., Fuzzy association rule mining for community crime pattern discovery, (ACM SIGKDD Workshop on Intelligence and Security Informatics, (2010), ACM), 2
[4] Cai, T. T.; Zhang, L., High-dimensional Gaussian copula regression: adaptive estimation and statistical inference, Statist. Sinica, 28, 2, 963-993, (2018) · Zbl 1390.62099
[5] Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R., Least angle regression, Ann. Statist., 32, 2, 407-499, (2004) · Zbl 1091.62054
[6] Fan, J.; Feng, Y.; Song, R., Nonparametric independence screening in sparse ultra-high-dimensional additive models, J. Amer. Statist. Assoc., 106, 494, (2012)
[7] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., 96, 456, 1348-1360, (2001) · Zbl 1073.62547
[8] Fan, J.; Lv, J., Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. Ser. B Stat. Methodol., 70, 5, 849-911, (2008)
[9] Foster, J. C.; Taylor, J. M.; Nan, B., Variable selection in monotone single-index models via the adaptive lasso, Stat. Med., 32, 22, 3944-3954, (2013)
[10] Han, F.; Zhao, T.; Liu, H., Coda: high dimensional copula discriminant analysis, J. Mach. Learn. Res., 14, 1, 629-671, (2013) · Zbl 1320.62145
[11] He, Y.; Zhang, X.; Wang, P., Discriminant analysis on high dimensional Gaussian copula model, Statist. Probab. Lett., 117, 100-112, (2016) · Zbl 1398.62164
[12] He, Y.; Zhang, X.; Wang, P.; Zhang, L., High dimensional Gaussian copula graphical model with \(\text{FDR}\) control, Comput. Statist. Data Anal., 113, 457-474, (2017) · Zbl 06917625
[13] Javanmard, A.; Montanari, A., Nearly optimal sample size in hypothesis testing for high-dimensional regression, (2013 51st Annual Allerton Conference on Communication, Control, and Computing, (Allerton), (2013), IEEE), 1427-1434
[14] Javanmard, A.; Montanari, A., Confidence intervals and hypothesis testing for high-dimensional regression, J. Mach. Learn. Res., 15, 1, 2869-2909, (2014) · Zbl 1319.62145
[15] Li, G.; Peng, H.; Zhang, J.; Zhu, L., Robust rank correlation based screening, Ann. Statist., 40, 3, 1846-1877, (2012) · Zbl 1257.62067
[16] Li, G.; Peng, H.; Zhu, L., Nonconcave penalized m-estimation with a diverging number of parameters, Statist. Sinica, 21, 391-419, (2011) · Zbl 1206.62036
[17] Liu, W., Gaussian graphical model estimation with false discovery rate control, Ann. Statist., 41, 6, 2948-2978, (2013) · Zbl 1288.62094
[18] Liu, H.; Han, F.; Yuan, M.; Lafferty, J.; Wasserman, L., High-dimensional semiparametric Gaussian copula graphical models, Ann. Statist., 40, 4, 2293-2326, (2012) · Zbl 1297.62073
[19] Liu, W., Luo, S., 2014. Hypothesis testing for high-dimensional regression models.
[20] Luo, S., Ghosal, S., 2015. Forward selection and estimation in high dimensional single index model.
[21] Masarotto, G.; Varin, C., Gaussian copula marginal regression, Electron. J. Stat., 6, 1517-1549, (2012) · Zbl 1336.62152
[22] Meier, L.; Van de Geer, S.; Bühlmann, P., High-dimensional additive modeling, Ann. Statist., 37, 6B, 3779-3821, (2009) · Zbl 1360.62186
[23] Needell, D.; Tropp, J. A., Cosamp: iterative signal recovery from incomplete and inaccurate samples, Appl. Comput. Harmon. Anal., 26, 3, 301-321, (2009) · Zbl 1163.94003
[24] Noh, H.; Ghouch, A. E.; Bouezmarni, T., Copula-based regression estimation and inference, J. Amer. Statist. Assoc., 108, 502, 676-688, (2013) · Zbl 06195970
[25] Pitt, M.; Chan, D.; Kohn, R., Efficient Bayesian inference for Gaussian copula regression models, Biometrika, 93, 3, 537-554, (2006) · Zbl 1108.62027
[26] Radchenko, P., High dimensional single index models, J. Multivariate Anal., 139, 266-282, (2015) · Zbl 1328.62482
[27] Radchenko, P.; James, G. M., Improved variable selection with forward-lasso adaptive shrinkage, Ann. Appl. Stat., 5, 1, 427-448, (2011) · Zbl 1220.62089
[28] Ravikumar, P.; Lafferty, J.; Liu, H.; Wasserman, L., Sparse additive models, J. R. Stat. Soc. Ser. B Stat. Methodol., 71, 5, 1009-1030, (2009)
[29] Song, R.; Lu, W.; Ma, S.; Jeng, X. J., Censored rank independence screening for high-dimensional survival data, Biometrika, 101, 4, 799-814, (2014) · Zbl 1306.62207
[30] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 58, 1, 267-288, (1996) · Zbl 0850.62538
[31] Van de Geer, S.; Bühlmann, P.; Ritov, Y.; Dezeure, R., On asymptotically optimal confidence regions and tests for high-dimensional models, Ann. Statist., 42, 3, 1166-1202, (2014) · Zbl 1305.62259
[32] Yuan, M., Zhou, D.-X., 2015. Minimax optimal rates of estimation in high dimensional additive models: Universal phase transition. arXiv preprint arXiv:1503.02817.
[33] Zhang, C.-H., Nearly unbiased variable selection under minimax concave penalty, Ann. Statist., 38, 2, 894-942, (2010) · Zbl 1183.62120
[34] Zhu, L.-P.; Zhu, L.-X., Nonconcave penalized inverse regression in single-index models with high dimensional predictors, J. Multivariate Anal., 100, 5, 862-875, (2009) · Zbl 1157.62037
[35] Zou, H., The adaptive lasso and its oracle properties, J. Amer. Statist. Assoc., 101, 476, 1418-1429, (2006) · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.