×

zbMATH — the first resource for mathematics

Minimal penalties for Gaussian model selection. (English) Zbl 1112.62082
Summary: This paper is mainly devoted to a precise analysis of what kind of penalties should be used in order to perform model selection via the minimization of a penalized least-squares type criterion within some general Gaussian framework including the classical ones. As compared to our previous paper on this topic [J. Eur. Math. Soc. (JEMS) 3, No. 3, 203–268 (2001; Zbl 1037.62001)], more elaborate forms of the penalties are given which are shown to be, in some sense, optimal. We indeed provide more precise upper bounds for the risk of the penalized estimators and lower bounds for the penalty terms, showing that the use of smaller penalties may lead to disastrous results. These lower bounds may also be used to design a practical strategy that allows to estimate the penalty from the data when the amount of noise is unknown. We provide an illustration of the method for the problem of estimating a piecewise constant signal in Gaussian noise when neither the number, nor the location of the change points are known.

MSC:
62M09 Non-Markovian processes: estimation
62G05 Nonparametric estimation
62G08 Nonparametric regression and quantile regression
62J05 Linear regression; mixed models
46N30 Applications of functional analysis in probability theory and statistics
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Abramovich, F., Benjamini, Y., Donoho, D.L., Johnstone, I.M.: Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34, (2006) · Zbl 1092.62005
[2] Akaike H. (1969). Statistical predictor identification. Ann. Inst. Statist. Math. 22:203–217 · Zbl 0259.62076
[3] Akaike H. (1973). Information theory and an extension of the maximum likelihood principle. In: Petrov P.N., Csaki F. (eds) Proceedings 2nd International Symposium on Information Theory. Akademia Kiado, Budapest, pp. 267–281 · Zbl 0283.62006
[4] Akaike H. (1974). A new look at the statistical model identification. IEEE Trans. Autom. Control 19:716–723 · Zbl 0314.62039
[5] Akaike H. A Bayesian analysis of the minimum AIC procedure. Ann. Inst. Statist. Math. 30, Part A, 9–14 (1978) · Zbl 0441.62007
[6] Amemiya T. (1985). Advanced Econometrics. Basil Blackwell, Oxford
[7] Barron A.R., Birgé L., Massart P. (1999). Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113:301–415 · Zbl 0946.62036
[8] Barron A.R., Cover T.M. (1991). Minimum complexity density estimation. IEEE Trans. Inf. Theory 37:1034–1054 · Zbl 0743.62003
[9] Birgé, L.: An alternative point of view on Lepski’s method. In: de Gunst, M.C.M., Klaassen, C.A.J., van der Vaart, A.W. (eds.) State of the Art in Probability and Statistics, Festschrift for Willem R. van Zwet, Institute of Mathematical Statistics, Lecture Notes–Monograph Series, Vol. 36. 113–133 (2001) · Zbl 1373.62142
[10] Birgé L., Massart P. (1998). Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4:329–375 · Zbl 0954.62033
[11] Birgé L., Massart P. (2001). Gaussian model selection. J. Eur. Math. Soc. 3:203–268 · Zbl 1037.62001
[12] Birgé, L., Massart, P.: A generalized C p criterion for Gaussian model selection. Technical Report No 647. Laboratoire de Probabilités, Université Paris VI (2001) http://www.proba. jussieu.fr/mathdoc/preprints/index.html#2001 · Zbl 1037.62001
[13] Daniel C., Wood F.S. (1971). Fitting Equations to Data. Wiley, New York · Zbl 0264.65011
[14] Draper N.R., Smith H. (1981). Applied Regression Analysis, 2nd edn. Wiley, New York · Zbl 0548.62046
[15] Efron B., Hastie R., Johnstone I.M., Tibshirani R. (2004). Least angle regression. Ann. Statist. 32:407–499 · Zbl 1091.62054
[16] Feller W. (1968). An Introduction to Probability Theory and its Applications, Vol I (3rd edn). Wiley, New York · Zbl 0155.23101
[17] George E.I., Foster D.P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87:731–747 · Zbl 1029.62008
[18] Gey S., Nédélec E. (2005). Model selection for CART regression trees. IEEE Trans. Inf. Theory 51:658–670 · Zbl 1301.62064
[19] Guyon X., Yao J.F. (1999). On the underfitting and overfitting sets of models chosen by order selection criteria. Jour. Multivar. Anal. 70:221–249 · Zbl 1070.62516
[20] Hannan E.J., Quinn B.G. (1979). The determination of the order of an autoregression. J.R.S.S., B 41:190–195 · Zbl 0408.62076
[21] Hoeffding W. (1963). Probability inequalities for sums of bounded random variables. J.A.S.A. 58:13–30 · Zbl 0127.10602
[22] Hurvich K.L., Tsai C.-L. (1989). Regression and time series model selection in small samples. Biometrika 76:297–307 · Zbl 0669.62085
[23] Johnstone, I.: Chi-square oracle inequalities. In: de Gunst, M.C.M., Klaassen, C.A.J. van der Vaart, A.W. (eds.) State of the Art in Probability and Statistics, Festschrift for Willem R. van Zwet, Institute of Mathematical Statistics, Lecture Notes–Monograph Series, Vol. 36. pp. 399–418 (2001) · Zbl 1373.62062
[24] Kneip A. (1994). Ordered linear smoothers. Ann. Statist. 22:835–866 · Zbl 0815.62022
[25] Lavielle M., Moulines E. (2000). Least Squares estimation of an unknown number of shifts in a time series. J. Time Series Anal. 21:33–59 · Zbl 0974.62070
[26] Lebarbier E. (2005). Detecting multiple change-points in the mean of a Gaussian process by model selection. Signal Proces. 85:717–736 · Zbl 1148.94403
[27] Li K.C. (1987). Asymptotic optimality for C p , C L , cross-validation, and generalized cross-validation: Discrete index set. Ann. Statist. 15:958–975 · Zbl 0653.62037
[28] Loubes, J.-M., Massart, P.: Discussion of ”Least angle regression” by Efron, B., Hastie, R., Johnstone, I., Tibshirani, R. Ann. Statist. 32, 460–465 (2004). · Zbl 1091.62054
[29] Mallows C.L. (1973). Some comments on C p . Technometrics 15:661–675 · Zbl 0269.62061
[30] Massart P. (1990). The tight constant in the D.K.W. inequality. Ann. Probab. 18:1269–1283 · Zbl 0713.62021
[31] McQuarrie A.D.R., Tsai C.-L. (1998). Regression and Time Series Model Selection. World Scientific, Singapore · Zbl 0907.62095
[32] Mitchell T.J., Beauchamp J.J. (1988). Bayesian variable selection in linear regression. J.A.S.A. 83:1023–1032 · Zbl 0673.62051
[33] Polyak B.T., Tsybakov A.B. (1990). Asymptotic optimality of the C p -test for the orthogonal series estimation of regression. Theory Probab. Appl. 35:293–306 · Zbl 0721.62042
[34] Rissanen J. (1978). Modeling by shortest data description. Automatica 14:465–471 · Zbl 0418.93079
[35] Schwarz G. (1978). Estimating the dimension of a model. Ann. Statist. 6:461–464 · Zbl 0379.62005
[36] Shen X., Ye J. (2002). Adaptive model selection. J.A.S.A. 97:210–221 · Zbl 1073.62509
[37] Shibata R. (1981). An optimal selection of regression variables. Biometrika 68:45–54 · Zbl 0464.62054
[38] Wallace D.L. (1959). Bounds on normal approximations to Student’s and the chi-square distributions. Ann. Math. Stat. 30:1121–1130 · Zbl 0101.35902
[39] Whittaker E.T., Watson G.N. (1927). A Course of Modern Analysis. Cambridge University Press, London · JFM 53.0180.04
[40] Yang Y. (2005). Can the strenghths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika 92:937–950 · Zbl 1151.62301
[41] Yao Y.C. (1988). Estimating the number of change points via Schwarz criterion. Stat. Probab. Lett. 6:181–189 · Zbl 0642.62016
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.