×

Goodness-of-fit tests for high-dimensional Gaussian linear models. (English) Zbl 1183.62074

Summary: Let \((Y, (X_i)_{1\leq i\leq p})\) be a real zero mean Gaussian vector and \(V\) be a subset of \(\{1, \dots , p\}\). Suppose we are given \(n\) i.i.d. replications of this vector. We propose a new test for testing that \(Y\) is independent of \((X_i)_{i\in \{1, \dots , p\}\setminus V}\) conditionally to \((X_i)_{i\in V}\) against the general alternative that it is not. This procedure does not depend on any prior information on the covariance of \(X\) or the variance of \(Y\) and applies in a high-dimensional setting. It straightforwardly extends to test the neighborhood of a Gaussian graphical model. The procedure is based on a model of Gaussian regression with random Gaussian covariates. We give nonasymptotic properties of the test and we prove that it is rate optimal [up to a possible \(\log (n)\) factor] over various classes of alternatives under some additional assumptions. Moreover, it allows us to derive nonasymptotic minimax rates of testing in this random design setting. Finally, we carry out a simulation study in order to evaluate the performance of our procedure.

MSC:

62G10 Nonparametric hypothesis testing
62J05 Linear regression; mixed models
62H20 Measures of association (correlation, canonical correlation, etc.)
05C90 Applications of graph theory
62H15 Hypothesis testing in multivariate analysis

Software:

GMRFLib
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Aldous, D. J. (1985). Exchangeability and related topics. In École d’été de probabilités de Saint Flour XIII. Lecture Notes in Math. 1117 . Springer, Berlin. · Zbl 0562.60042
[2] Baraud, Y. (2002). Non-asymptotic rates of testing in signal detection. Bernoulli 8 577-606. · Zbl 1007.62042
[3] Baraud, Y., Huet, S. and Laurent, B. (2003). Adaptative tests of linear hypotheses by model selection. Ann. Statist. 31 225-251. · Zbl 1018.62037
[4] Bühlmann, P., Kalisch, M. and Maathuis, M. H. (2009). Variable selection for high-dimensional models: Partially faithful distributions and the PC-simple algorithm. Biometrika . · Zbl 1233.62135
[5] Candès, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313-2351. · Zbl 1139.62019
[6] Cowell, R. G., Dawid, A. P., Lauritzen, S. L. and Spiegelhalter, D. J. (1999). Probabilistic Networks and Expert Systems . Springer, New York. · Zbl 0937.68121
[7] Cressie, N. (1993). Statistics for Spatial Data , revised ed. Wiley, New York. · Zbl 0799.62002
[8] Drton, M. and Perlman, M. (2007). Multiple testing and error control in Gaussian graphical model selection. Statist. Sci. 22 430-449. · Zbl 1246.62143
[9] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054
[10] Giraud, C. (2008). Estimation of Gaussian graphs by model selection. Electron. J. Stat. 2 542-563. · Zbl 1320.62094
[11] Huang, J., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likehood. Biometrika 93 85-98. · Zbl 1152.62346
[12] Ingster, Y. I. (1993). Asymptotically minimax hypothesis testing for nonparametric alternatives I. Math. Methods Statist. 2 85-114. · Zbl 0798.62057
[13] Ingster, Y. I. (1993). Asymptotically minimax hypothesis testing for nonparametric alternatives II. Math. Methods Statist. 3 171-189. · Zbl 0798.62058
[14] Ingster, Y. I. (1993). Asymptotically minimax hypothesis testing for nonparametric alternatives III. Math. Methods Statist. 4 249-268. · Zbl 0798.62059
[15] Kishino, H. and Waddell, P. (2000). Correspondence analysis of genes and tissue types and finding genetic links from microarray data. Genome Informatics 11 83-95.
[16] Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic function by model selection. Ann. Statist. 28 1302-1338. · Zbl 1105.62328
[17] Lauritzen, S. L. (1996). Graphical Models . Oxford Univ. Press, New York. · Zbl 0907.62001
[18] Massart, P. (2007). Concentration inequalities and model selection. In École d’été de probabilités de Saint Flour XXXIII. Lecture Notes in Math. 1896 . Springer, Berlin. · Zbl 1170.60006
[19] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082
[20] Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications . Chapman and Hall/CRC, London. · Zbl 1093.60003
[21] Schäfer, J. and Strimmer, K. (2005). An empirical Bayes approach to inferring large-scale gene association network. Bioinformatics 21 754-764.
[22] Spokoiny, V. G. (1996). Adaptative hypothesis testing using wavelets. Ann. Statist. 24 2477-2498. · Zbl 0898.62056
[23] Verzelen, N. and Villers, F. (2009). Tests for Gaussian graphical models. Comput. Statist. Data Anal. 53 1894-1905. · Zbl 1453.62229
[24] Wainwright, M. J. (2007). Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. Technical Report 725, Dept. Statistics, Univ. California, Berkeley. · Zbl 1367.94106
[25] Wille, A. and Bühlmann, P. (2006). Low-order conditional independence graphs for inferring genetic networks. Stat. Appl. Genet. Mol. Biol. 5 Art. 1 (electronic). · Zbl 1166.62374
[26] Wille, A., Zimmermann, P., Vranova, E., Fürholz, A., Laule, O., Bleuler, S., Hennig, L., Prelic, A., von Rohr, P., Thiele, L., Zitzler, E., Gruissem, W. and Bühlmann, P. (2004). Sparse graphical Gaussian modelling of the isoprenoid gene network in arabidopsis thaliana. Genome Biology 5 11.
[27] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19-35. · Zbl 1142.62408
[28] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. · Zbl 1142.62044
[29] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008
[30] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the Elastic Net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301-320. JSTOR: · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.