×

On robust regression with high-dimensional predictors. (English) Zbl 1359.62184

Summary: We study regression M-estimates in the setting where \(p\), the number of covariates, and \(n\), the number of observations, are both large, but \(p\leq n\). We find an exact stochastic representation for the distribution of \(\hat\beta = \operatorname{argmin}_{\beta \in \mathbb{R}^p} \sum_{i=1}^n \rho (Y_i - X_i' \beta)\) at fixed \(p\) and \(n\) under various assumptions on the objective function \(\rho\) and our statistical model. A scalar random variable whose deterministic limit \(r_\rho(\kappa)\) can be studied when \(p/n\to \kappa > 0\) plays a central role in this representation. We discover a nonlinear system of two deterministic equations that characterizes \(r_\rho(\kappa)\). Interestingly, the system shows that \(r_\rho(\kappa)\) depends on \(\rho\) through proximal mappings of \(\rho\) as well as various aspects of the statistical model underlying our study. Several surprising results emerge. In particular, we show that, when \(p/n\) is large enough, least squares becomes preferable to least absolute deviations for double-exponential errors.

MSC:

62H12 Estimation in multivariate analysis
62F35 Robustness and adaptive procedures (parametric inference)
62F12 Asymptotic properties of parametric estimators
62H10 Multivariate distribution of statistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Huber PJ, Ronchetti EM (2009) Robust Statistics. Wiley Series in Probability and Statistics (Wiley, Hoboken, NJ), 2nd Ed.
[2] Huber PJ (1973) Robust regression: Asymptotics, conjectures and Monte Carlo. Ann Stat 1:799–821. · Zbl 0289.62033 · doi:10.1214/aos/1176342503
[3] Portnoy S (1984) Asymptotic behavior of M-estimators of p regression parameters when p2/n is large. I. Consistency. Ann Stat 12(4):1298–1309. · Zbl 0584.62050 · doi:10.1214/aos/1176346793
[4] Bloomfield P (1974) On the distribution of the residuals from a fitted linear model (Department of Statistics, Princeton Univ, Princeton, NJ), Technical Report 56, Series 2.
[5] El Karoui N, Bean D, Bickel P, Lim C, Yu B (2012) On robust regression with high-dimensional predictors (Department of Statistics, Univ of California, Berkeley, CA), Technical Report 811. · Zbl 1359.62184
[6] Moreau J-J (1965) Proximité et dualité dans un espace hilbertien. Bull Soc Math France 93:273–299. French. · Zbl 0136.12101
[7] DOI: 10.1214/10-AOS795 · Zbl 1274.62365 · doi:10.1214/10-AOS795
[8] Eaton ML (1983) Multivariate Statistics: A Vector Space Approach (Wiley, New York); reprinted (2007) Institute of Mathematical Statistics Lecture Notes–Monograph Series (Institute of Mathematical Statistics, Beachwood, OH), Vol 53.
[9] Bean D, Bickel PJ, El Karoui N, Yu B (2013) Optimal M-estimation in high-dimensional regression. Proc Natl Acad Sci USA 110:14563–14568. · doi:10.1073/pnas.1307845110
[10] Horn RA, Johnson CR (1994) Topics in Matrix Analysis (Cambridge Univ Press, Cambridge, UK), corrected reprint of the 1991 original.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.