## Regularization of case-specific parameters for robustness and efficiency.(English)Zbl 1331.62349

Summary: Regularization methods allow one to handle a variety of inferential problems where there are more covariates than cases. This allows one to consider a potentially enormous number of covariates for a problem. We exploit the power of these techniques, supersaturating models by augmenting the “natural” covariates in the problem with an additional indicator for each case in the data set. We attach a penalty term for these case-specific indicators which is designed to produce a desired effect. For regression methods with squared error loss, an $$\ell_{1}$$ penalty produces a regression which is robust to outliers and high leverage cases; for quantile regression methods, an $$\ell_{2}$$ penalty decreases the variance of the fit enough to overcome an increase in bias. The paradigm thus allows us to robustify procedures which lack robustness and to increase the efficiency of procedures which are robust.
We provide a general framework for the inclusion of case-specific parameters in regularization problems, describing the impact on the effective loss for a variety of regression and classification problems. We outline a computational strategy by which existing software can be modified to solve the augmented regularization problem, providing conditions under which such modification will converge to the optimum solution. We illustrate the benefits of including case-specific parameters in the context of mean regression and quantile regression through analysis of NHANES and linguistic data sets.

### MSC:

 62J07 Ridge regression; shrinkage estimators (Lasso) 62J02 General nonlinear regression 62H30 Classification and discrimination; cluster analysis (statistical aspects)

### Software:

 [1] Baayen, R. H. (2007). Analyzing Linguistic Data : A Practical Introduction to Statistics . Cambridge Univ. Press, Cambridge, England. [2] Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H. and Yap, M. J. (2004). Visual word recognition of single-syllable words. Journal of Experimental Psychology 133 283-316. [3] Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101 138-156. · Zbl 1118.62330 [4] Bi, J., Bennett, K., Embrechts, M., Breneman, C. and Song, M. (2003). Dimensionality reduction via sparse support vector machines. J. Mach. Learn. Res. 3 1229-1243. · Zbl 1102.68531 [5] Efron, B. (1991). Regression percentiles using asymmetric squared error loss. Statist. Sinica 1 93-125. · Zbl 0822.62054 [6] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004a). Least angle regression (with discussion, and a rejoinder by the authors). Ann. Statist. 32 407-499. · Zbl 1091.62054 [7] Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. ACM 24 381-395. [8] Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119-139. · Zbl 0880.68103 [9] Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability 43 . Chapman & Hall, London. · Zbl 0747.62061 [10] Hastie, T., Rosset, S., Tibshirani, R. and Zhu, J. (2004). The entire regularization path for the support vector machine. J. Mach. Learn. Res. 5 1391-1415. · Zbl 1222.68213 [11] He, X. (1997). Quantile curves without crossing. Amer. Statist. 51 186-192. [12] Hjort, N. L. and Pollard, D. (1993). Asymptotics for minimisers of convex processes. Technical report, Dept. Statistics, Yale Univ. [13] Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 55-67. · Zbl 0202.17205 [14] Huber, P. J. (1981). Robust Statistics . Wiley, New York. · Zbl 0536.62025 [15] Jung, Y., MacEachern, S. N. and Lee, Y. (2010). Window width selection for $$\ell_{2}$$ adjusted quantile regression. Technical Report 835, Dept. Statistics, Ohio State Univ. [16] Knight, K. (1998). Limiting distributions for $$L_{1}$$ regression estimators under general conditions. Ann. Statist. 26 755-770. · Zbl 0929.62021 [17] Koenker, R. (2005). Quantile Regression. Econometric Society Monographs 38 . Cambridge Univ. Press, Cambridge. · Zbl 1111.62037 [18] Koenker, R. and Bassett, G., Jr. (1978). Regression quantiles. Econometrica 46 33-50. · Zbl 0373.62038 [19] Koenker, R. and Hallock, K. (2001). Quantile regression. Journal of Economic Perspectives 15 143-156. [20] Lee, Y., MacEachern, S. N. and Jung, Y. (2007). Regularization of case-specific parameters for robustness and efficiency. Technical Report 799, Dept. Statistics, Ohio State Univ. · Zbl 1331.62349 [21] Lee, Y.-J. and Mangasarian, O. L. (2001). SSVM: A smooth support vector machine for classification. Comput. Optim. Appl. 20 5-22. · Zbl 1017.90105 [22] Lee, Y. and Wang, R. (2011). Does modeling lead to more accurate classification?: A study of relative efficiency. Unpublished manuscript. · Zbl 1302.62142 [23] McCullagh, P. and Nelder, J. (1989). Generalized Linear Models , 2nd ed. Chapman & Hall/CRC, Boca Raton, FL. · Zbl 0744.62098 [24] Newey, W. K. and Powell, J. L. (1987). Asymmetric least squares estimation and testing. Econometrica 55 819-847. · Zbl 0625.62047 [25] Nychka, D., Gray, G., Haaland, P., Martin, D. and O’Connell, M. (1995). A nonparametric regression approach to syringe grading for quality improvement. J. Amer. Statist. Assoc. 90 1171-1178. · Zbl 0864.62066 [26] Pollard, D. (1991). Asymptotics for least absolute deviation regression estimators. Econometric Theory 7 186-199. · Zbl 04504753 [27] Portnoy, S. and Koenker, R. (1997). The Gaussian hare and the Laplacian tortoise: Computability of squared-error versus absolute-error estimators. Statist. Sci. 12 279-300. · Zbl 0955.62608 [28] Pregibon, D. (1982). Resistant fits for some commonly used logistic models with medical applications. Biometrics 38 485-498. [29] Rockafellar, R. T. (1997). Convex Analysis . Princeton Univ. Press, Princeton, NJ. · Zbl 0932.90001 [30] Rosset, S. and Zhu, J. (2004). Discussion of “Least angle regression,” by B. Efron, T. Hastie, I. Johnstone and R. Tibshirani. Ann. Statist. 32 469-475. · Zbl 1091.62054 [31] Rosset, S. and Zhu, J. (2007). Piecewise linear regularized solution paths. Ann. Statist. 35 1012-1030. · Zbl 1194.62094 [32] Shen, X., Tseng, G. C., Zhang, X. and Wong, W. H. (2003). On $$\psi$$-learning. J. Amer. Statist. Assoc. 98 724-734. · Zbl 1052.62095 [33] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538 [34] Vapnik, V. N. (1998). Statistical Learning Theory . Wiley, New York. · Zbl 0935.62007 [35] Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59 . SIAM, Philadelphia, PA. · Zbl 0813.62001 [36] Weisberg, S. (2004). Discussion of “Least angle regression,” by B. Efron, T. Hastie, I. Johnstone and R. Tibshirani. Ann. Statist. 32 490-494. · Zbl 1091.62054 [37] Weisberg, S. (2005). Applied Linear Regression , 3rd ed. Wiley-Interscience, Hoboken, NJ. · Zbl 1068.62077 [38] Wu, Y. and Liu, Y. (2007). Robust truncated hinge loss support vector machines. J. Amer. Statist. Assoc. 102 974-983. · Zbl 1469.62293 [39] Xu, H., Caramanis, C. and Mannor, S. (2009). Robustness and regularization of support vector machines. J. Mach. Learn. Res. 10 1485-1510. · Zbl 1235.68209 [40] Yu, Q., MacEachern, S. N. and Peruggia, M. (2011). Bayesian synthesis: Combining subjective analyses, with an application to ozone data. Ann. Appl. Stat. 5 1678-1698. · Zbl 1223.62016 [41] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030 [42] Zhou, N. and Zhu, J. (2007). Group variable selection via hierarchical lasso and its oracle property. Technical report, Dept. Statistics, Univ. Michigan. · Zbl 1245.62183 [43] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301-320. · Zbl 1069.62054