Birgé, Lucien; Massart, Pascal Minimal penalties for Gaussian model selection. (English) Zbl 1112.62082 Probab. Theory Relat. Fields 138, No. 1-2, 33-73 (2007). Summary: This paper is mainly devoted to a precise analysis of what kind of penalties should be used in order to perform model selection via the minimization of a penalized least-squares type criterion within some general Gaussian framework including the classical ones. As compared to our previous paper on this topic [J. Eur. Math. Soc. (JEMS) 3, No. 3, 203–268 (2001; Zbl 1037.62001)], more elaborate forms of the penalties are given which are shown to be, in some sense, optimal. We indeed provide more precise upper bounds for the risk of the penalized estimators and lower bounds for the penalty terms, showing that the use of smaller penalties may lead to disastrous results. These lower bounds may also be used to design a practical strategy that allows to estimate the penalty from the data when the amount of noise is unknown. We provide an illustration of the method for the problem of estimating a piecewise constant signal in Gaussian noise when neither the number, nor the location of the change points are known. Cited in 6 ReviewsCited in 116 Documents MSC: 62M09 Non-Markovian processes: estimation 62G05 Nonparametric estimation 62G08 Nonparametric regression and quantile regression 62J05 Linear regression; mixed models 46N30 Applications of functional analysis in probability theory and statistics Keywords:Gaussian linear regression; variable selection; model selection; penalized least-squares Citations:Zbl 1037.62001 × Cite Format Result Cite Review PDF Full Text: DOI References: [1] Abramovich, F., Benjamini, Y., Donoho, D.L., Johnstone, I.M.: Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34, (2006) · Zbl 1092.62005 [2] Akaike, H., Statistical predictor identification, Ann. Inst. Statist. Math., 22, 203-217 (1969) · Zbl 0259.62076 · doi:10.1007/BF02506337 [3] Akaike, H.; Petrov, P. N.; Csaki, F., Information theory and an extension of the maximum likelihood principle, Proceedings 2nd International Symposium on Information Theory., 267-281 (1973), Budapest: Akademia Kiado, Budapest · Zbl 0283.62006 [4] Akaike, H., A new look at the statistical model identification, IEEE Trans. Autom. Control, 19, 716-723 (1974) · Zbl 0314.62039 · doi:10.1109/TAC.1974.1100705 [5] Akaike H. A Bayesian analysis of the minimum AIC procedure. Ann. Inst. Statist. Math. 30, Part A, 9-14 (1978) · Zbl 0441.62007 [6] Amemiya, T., Advanced Econometrics (1985), Oxford: Basil Blackwell, Oxford [7] Barron, A. R.; Birgé, L.; Massart, P., Risk bounds for model selection via penalization, Probab. Theory Relat. Fields, 113, 301-415 (1999) · Zbl 0946.62036 · doi:10.1007/s004400050210 [8] Barron, A. R.; Cover, T. M., Minimum complexity density estimation, IEEE Trans. Inf. Theory, 37, 1034-1054 (1991) · Zbl 0743.62003 · doi:10.1109/18.86996 [9] Birgé, L.: An alternative point of view on Lepski’s method. In: de Gunst, M.C.M., Klaassen, C.A.J., van der Vaart, A.W. (eds.) State of the Art in Probability and Statistics, Festschrift for Willem R. van Zwet, Institute of Mathematical Statistics, Lecture Notes-Monograph Series, Vol. 36. 113-133 (2001) · Zbl 1373.62142 [10] Birgé, L.; Massart, P., Minimum contrast estimators on sieves: exponential bounds and rates of convergence, Bernoulli, 4, 329-375 (1998) · Zbl 0954.62033 · doi:10.2307/3318720 [11] Birgé, L.; Massart, P., Gaussian model selection, J. Eur. Math. Soc., 3, 203-268 (2001) · Zbl 1037.62001 · doi:10.1007/s100970100031 [12] Birgé, L., Massart, P.: A generalized C_p criterion for Gaussian model selection. Technical Report No 647. Laboratoire de Probabilités, Université Paris VI (2001) http://www.proba. jussieu.fr/mathdoc/preprints/index.html#2001 · Zbl 1037.62001 [13] Daniel, C.; Wood, F. S., Fitting Equations to Data (1971), New York: Wiley, New York · Zbl 0264.65011 [14] Draper, N. R.; Smith, H., Applied Regression Analysis (1981), New York: Wiley, New York · Zbl 0548.62046 [15] Efron, B.; Hastie, R.; Johnstone, I. M.; Tibshirani, R., Least angle regression, Ann. Statist., 32, 407-499 (2004) · Zbl 1091.62054 · doi:10.1214/009053604000000067 [16] Feller, W., An Introduction to Probability Theory and its Applications, Vol I (1968), New York: Wiley, New York · Zbl 0155.23101 [17] George, E. I.; Foster, D. P., Calibration and empirical Bayes variable selection, Biometrika, 87, 731-747 (2000) · Zbl 1029.62008 · doi:10.1093/biomet/87.4.731 [18] Gey, S.; Nédélec, E., Model selection for CART regression trees, IEEE Trans. Inf. Theory, 51, 658-670 (2005) · Zbl 1301.62064 · doi:10.1109/TIT.2004.840903 [19] Guyon, X.; Yao, J. F., On the underfitting and overfitting sets of models chosen by order selection criteria, Jour. Multivar. Anal., 70, 221-249 (1999) · Zbl 1070.62516 · doi:10.1006/jmva.1999.1828 [20] Hannan, E. J.; Quinn, B. G., The determination of the order of an autoregression, J.R.S.S., B, 41, 190-195 (1979) · Zbl 0408.62076 [21] Hoeffding, W., Probability inequalities for sums of bounded random variables, J.A.S.A., 58, 13-30 (1963) · Zbl 0127.10602 [22] Hurvich, K. L.; Tsai, C.-L., Regression and time series model selection in small samples, Biometrika, 76, 297-307 (1989) · Zbl 0669.62085 · doi:10.1093/biomet/76.2.297 [23] Johnstone, I.: Chi-square oracle inequalities. In: de Gunst, M.C.M., Klaassen, C.A.J. van der Vaart, A.W. (eds.) State of the Art in Probability and Statistics, Festschrift for Willem R. van Zwet, Institute of Mathematical Statistics, Lecture Notes-Monograph Series, Vol. 36. pp. 399-418 (2001) · Zbl 1373.62062 [24] Kneip, A., Ordered linear smoothers, Ann. Statist., 22, 835-866 (1994) · Zbl 0815.62022 [25] Lavielle, M.; Moulines, E., Least Squares estimation of an unknown number of shifts in a time series, J. Time Series Anal., 21, 33-59 (2000) · Zbl 0974.62070 · doi:10.1111/1467-9892.00172 [26] Lebarbier, E., Detecting multiple change-points in the mean of a Gaussian process by model selection, Signal Proces., 85, 717-736 (2005) · Zbl 1148.94403 · doi:10.1016/j.sigpro.2004.11.012 [27] Li, K. C., Asymptotic optimality for C_p, C_L, cross-validation, and generalized cross-validation: Discrete index set, Ann. Statist., 15, 958-975 (1987) · Zbl 0653.62037 [28] Loubes, J.-M., Massart, P.: Discussion of “Least angle regression” by Efron, B., Hastie, R., Johnstone, I., Tibshirani, R. Ann. Statist. 32, 460-465 (2004). [29] Mallows, C. L., Some comments on C_p, Technometrics, 15, 661-675 (1973) · Zbl 0269.62061 · doi:10.2307/1267380 [30] Massart, P., The tight constant in the D.K.W. inequality, Ann. Probab., 18, 1269-1283 (1990) · Zbl 0713.62021 [31] McQuarrie, A. D.R.; Tsai, C.-L., Regression and Time Series Model Selection (1998), Singapore: World Scientific, Singapore · Zbl 0907.62095 [32] Mitchell, T. J.; Beauchamp, J. J., Bayesian variable selection in linear regression, J.A.S.A., 83, 1023-1032 (1988) · Zbl 0673.62051 [33] Polyak, B. T.; Tsybakov, A. B., Asymptotic optimality of the C_p-test for the orthogonal series estimation of regression, Theory Probab. Appl., 35, 293-306 (1990) · Zbl 0721.62042 · doi:10.1137/1135037 [34] Rissanen, J., Modeling by shortest data description, Automatica, 14, 465-471 (1978) · Zbl 0418.93079 · doi:10.1016/0005-1098(78)90005-5 [35] Schwarz, G., Estimating the dimension of a model, Ann. Statist., 6, 461-464 (1978) · Zbl 0379.62005 [36] Shen, X.; Ye, J., Adaptive model selection, J.A.S.A., 97, 210-221 (2002) · Zbl 1073.62509 [37] Shibata, R., An optimal selection of regression variables, Biometrika, 68, 45-54 (1981) · Zbl 0464.62054 · doi:10.1093/biomet/68.1.45 [38] Wallace, D. L., Bounds on normal approximations to Student’s and the chi-square distributions, Ann. Math. Stat., 30, 1121-1130 (1959) · Zbl 0101.35902 [39] Whittaker, E. T.; Watson, G. N., A Course of Modern Analysis (1927), London: Cambridge University Press, London · JFM 53.0180.04 [40] Yang, Y., Can the strenghths of AIC and BIC be shared? A conflict between model identification and regression estimation, Biometrika, 92, 937-950 (2005) · Zbl 1151.62301 · doi:10.1093/biomet/92.4.937 [41] Yao, Y. C., Estimating the number of change points via Schwarz criterion, Stat. Probab. Lett., 6, 181-189 (1988) · Zbl 0642.62016 · doi:10.1016/0167-7152(88)90118-6 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.