zbMATH — the first resource for mathematics

Robust linear least squares regression. (English) Zbl 1231.62126
Summary: We consider the problem of robustly predicting as well as the best linear combination of \(d\) given functions in least squares regression, and variants of this problem including constraints on the parameters of the linear combination. For the ridge estimator and the ordinary least squares estimator, and their variants, we provide new risk bounds of order \(d/n\) without logarithmic factor unlike some standard results, where \(n\) is the size of the training data. We also provide a new estimator with better deviations in the presence of heavy-tailed noise. It is based on truncating differences of losses in a min-max framework and satisfies a \(d/n\) risk bound both in expectation and in deviations. The key common surprising factor of these results is the absence of exponential moment conditions on the output distribution while achieving exponential deviations. All risk bounds are obtained through a PAC-Bayesian analysis on truncated differences of losses. Experimental results strongly back up our truncated min-max estimator.

62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
62F35 Robustness and adaptive procedures (parametric inference)
62C20 Minimax procedures in statistical decision theory
62F30 Parametric inference under constraints
65C60 Computational problems in statistics (MSC2010)
PDF BibTeX Cite
Full Text: DOI arXiv
[1] Audibert, J. Y. and Catoni, O. (2010). Robust linear regression through PAC-Bayesian truncation. Available at .
[2] Audibert, J. Y. and Catoni, O. (2011). Supplement to “Robust linear least squares regression.” . · Zbl 1231.62126
[3] Baraud, Y. (2000). Model selection for regression on a fixed design. Probab. Theory Related Fields 117 467-493. · Zbl 0997.62027
[4] Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli 4 329-375. · Zbl 0954.62033
[5] Catoni, O. (2010). Challenging the empirical mean and empirical variance: A deviation study. Available at . · Zbl 1282.62070
[6] Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2004). A Distribution-Free Theory of Nonparametric Regression . Springer, New York. · Zbl 1021.62024
[7] Langford, J. and Shawe-Taylor, J. (2002). PAC-Bayes and margins. In Advances in Neural Information Processing Systems (S. Becker, S. Thrun and K. Obermayer, eds.) 15 423-430. MIT Press, Cambridge, MA.
[8] Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on Probability Theory and Statistics ( Saint-Flour , 1998). Lecture Notes in Math. 1738 85-277. Springer, Berlin. · Zbl 0998.62033
[9] Rousseeuw, P. and Yohai, V. (1984). Robust regression by means of S-estimators. In Robust and Nonlinear Time Series Analysis ( Heidelberg , 1983). Lecture Notes in Statist. 26 256-272. Springer, New York. · Zbl 0567.62027
[10] Sauvé, M. (2010). Piecewise polynomial estimation of a regression function. IEEE Trans. Inform. Theory 56 597-613. · Zbl 1366.62086
[11] Tsybakov, A. B. (2003). Optimal rates of aggregation. In Computational Learning Theory and Kernel Machines (B. Scholkopf and M. Warmuth, eds.). Lecture Notes in Artificial Intelligence 2777 303-313. Springer, Berlin. · Zbl 1208.62073
[12] Yang, Y. (2004). Aggregating regression procedures to improve performance. Bernoulli 10 25-47. · Zbl 1040.62030
[13] Yohai, V. J. (1987). High breakdown-point and high efficiency robust estimates for regression. Ann. Statist. 15 642-656. · Zbl 0624.62037
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.