Gaussian model selection with an unknown variance. (English) Zbl 1162.62051

Summary: Let \(Y\) be a Gaussian vector whose components are independent with a common unknown variance. We consider the problem of estimating the mean \(\mu \) of \(Y\) by model selection. More precisely, we start with a collection \(\mathcal S = \{S_m,m\in \mathcal M\}\) of linear subspaces of \(\mathbb R^n\) and associate to each of these the least-squares estimator of \(\mu \) on \(S_m\). Then, we use a data driven penalized criterion in order to select one estimator among these.
Our first objective is to analyze the performance of estimators associated to classical criteria such as FPE, AIC, BIC and AMDL. Our second objective is to propose better penalties that are versatile enough to take into account both the complexity of the collection \(\mathcal S\) and the sample size. Then we apply those to solve various statistical problems such as variable selection, change point detections and signal estimation, among others. Our results are based on a nonasymptotic risk bound with respect to the Euclidean loss for the selected estimator. Some analogous results are also established for the Kullback loss.


62H12 Estimation in multivariate analysis
62G08 Nonparametric regression and quantile regression
62J05 Linear regression; mixed models
Full Text: DOI arXiv


[1] Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584-653. · Zbl 1092.62005
[2] Akaike, H. (1969). Statistical predictor identification. Ann. Inst. Statist. Math. 22 203-217. · Zbl 0259.62076
[3] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principal. In 2nd International Symposium on Information Theory (B. N. Petrov and F. Csáki, eds.) 267-281. Akadémiai Kiadó, Budapest. · Zbl 0283.62006
[4] Akaike, H. (1978). A Bayesian analysis of the minimum AIC procedure. Ann. Inst. Statist. Math. Part A 30 9-14. · Zbl 0441.62007
[5] Baraud, Y., Huet, S. and Laurent, B. (2003). Adaptive tests of linear hypotheses by model selection. Ann. Statist. 31 225-251. · Zbl 1018.62037
[6] Baraud, Y., Giraud, C. and Huet, S. (2007). Gaussian model selection with unknown variance. Technical report. Available at · Zbl 1162.62051
[7] Barron, A. R. (1991). Complexity regularization with applications to artificial neural networks. In Nonparametric Functional Estimation (Roussas G., ed.) 561-576. Kluwer, Dordrecht. · Zbl 0739.62001
[8] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301-413. · Zbl 0946.62036
[9] Barron, A. R. and Cover, T. M. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 1034-1054. · Zbl 0743.62003
[10] Birgé, L. and Massart, P. (1997). From model selection to adaptive estimation. In Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics (Pollard, D., Torgersen, E. and Yang, G. eds.) 55-87. Springer, New York. · Zbl 0920.62042
[11] Birgé, L. and Massart, P. (2001a). Gaussian model selection. J. Eur. Math. Soc. 3 203-268. · Zbl 1037.62001
[12] Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138 33-73. · Zbl 1112.62082
[13] Candès, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n . Ann. Statist. 35 2313-2351. · Zbl 1139.62019
[14] Donoho, D. and Johnstone, I. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425-455. JSTOR: · Zbl 0815.62019
[15] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054
[16] Huet, S. (2006). Model selection for estimating the nonzero components of a Gaussian vector. ESAIM Probab. Statist. 10 164-183. · Zbl 1187.62103
[17] Ibragimov, I. A. and Khas’minskii, R. Z. (1981). On the nonparametric density estimates. Zap. Nauchn. Semin. LOMI 108 73-89.
[18] Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1302-1338. · Zbl 1105.62328
[19] Lebarbier, E. (2005). Detecting multiple change-points in the mean of Gaussian process by model selection. Signal Processing 85 717-736. · Zbl 1148.94403
[20] Mallows, C. L. (1973). Some comments on C p . Technometrics 15 661-675. · Zbl 0269.62061
[21] Mcquarrie, A. D. R. and Tsai, C. L. (1998). Regression and Times Series Model Selection . World Scientific Publishing, River Edge, NJ. · Zbl 0907.62095
[22] Rissanen, J. (1983). A universal prior for integers and estimation by description minimum length. Ann. Statist. 11 416-431. · Zbl 0513.62005
[23] Rissanen, J. (1984). Universal coding, information, prediction and estimation. IEEE Trans. Inform. Theory 30 629-636. · Zbl 0574.62003
[24] Saito, N. (1994). Simultaneous noise suppression and signal compression using a library of orthogonal bases and the minimum description length criterion. In Wavelets in Geophysics (E. Foufoula-Georgiou and P. Kumar, eds.) 299-324. Academic Press, San Diego, CA.
[25] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005
[26] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. JSTOR: · Zbl 0850.62538
[27] Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.