Robustness by reweighting for kernel estimators: an overview. (English) Zbl 07473937

Summary: Using least squares techniques, there is an awareness of the dangers posed by the occurrence of outliers present in the data. In general, outliers may totally spoil an ordinary least squares analysis. To cope with this problem, statistical techniques have been developed that are not so easily affected by outliers. These methods are called robust or resistant. In this overview paper, we illustrate that robust solutions can be acquired by solving a reweighted least squares problem even though the initial solution is not robust. This overview paper relates classical results from robustness to the most recent advances of robustness in least squares kernel based regression, with an emphasis on theoretical results as well as practical examples. Software for iterative reweighting is also made freely available to the user.


62-XX Statistics
Full Text: DOI


[1] Aftab, K. and Hartley, R. (2014). Convergence of iteratively re-weighted least squares to robust M-estimators. In IEEE Winter Conference on Applications of Computer Vision (WACV) 480-487.
[2] Arce, G. R. (2005). Nonlinear Signal Processing: A Statistical Approach. Wiley Interscience, Hoboken, NJ. · Zbl 1167.94300
[3] Ba, D., Babadi, B., Purdon, P. L. and Brown, E. N. (2014). Convergence and stability of iteratively re-weighted least squares algorithms. IEEE Trans. Signal Process. 62 183-195. · Zbl 1394.94055 · doi:10.1109/TSP.2013.2287685
[4] Bissantz, N., Dümbgen, L., Munk, A. and Stratmann, B. (2008). Convergence analysis of generalized iteratively reweighted least squares algorithms on convex function spaces. SIAM J. Optim. 19 1828-1845. · Zbl 1175.62036 · doi:10.1137/050639132
[5] Christmann, A. and Steinwart, I. (2003/04). On robustness properties of convex risk minimization methods for pattern recognition. J. Mach. Learn. Res. 5 1007-1034. · Zbl 1222.68348
[6] Christmann, A. and Steinwart, I. (2007). Consistency and robustness of kernel-based regression in convex risk minimization. Bernoulli 13 799-819. · Zbl 1129.62031 · doi:10.3150/07-BEJ5102
[7] Christmann, A. and Van Messem, A. (2008). Bouligand derivatives and robustness of support vector machines for regression. J. Mach. Learn. Res. 9 915-936. · Zbl 1225.68164
[8] Croux, C., Rousseeuw, P. J. and Hössjer, O. (1994). Generalized \(S\)-estimators. J. Amer. Statist. Assoc. 89 1271-1281. · Zbl 0812.62073
[9] De Brabanter, K., Suykens, J. A. K. and De Moor, B. (2013). Nonparametric regression via StatLSSVM. J. Stat. Softw. 55 1-23.
[10] De Brabanter, K., Pelckmans, K., De Brabanter, J., Debruyne, M., Suykens, J. A. K., Hubert, M. and De Moor, B. (2009). Robustness of kernel based regression: A comparison of iterative weighting schemes. In Proc. of the 19th International Conference on Artificial Neural Networks (ICANN) 100-110.
[11] De Vito, E., Rosasco, L., Caponnetto, A., Piana, M. and Verri, A. (2003/04). Some properties of regularized kernel methods. J. Mach. Learn. Res. 5 1363-1390. · Zbl 1222.68181
[12] Debruyne, M., Hubert, M. and Suykens, J. A. K. (2008). Model selection in kernel based regression using the influence function. J. Mach. Learn. Res. 9 2377-2400. · Zbl 1225.62051
[13] Debruyne, M., Christmann, A., Hubert, M. and Suykens, J. A. K. (2010). Robustness of reweighted least squares kernel based regression. J. Multivariate Anal. 101 447-463. · Zbl 1178.62035 · doi:10.1016/j.jmva.2009.09.007
[14] Dollinger, M. B. and Staudte, R. G. (1991). Influence functions of iteratively reweighted least squares estimators. J. Amer. Statist. Assoc. 86 709-716. · Zbl 0739.62024
[15] Edgeworth, F. Y. (1887). On observations relating to several quantities. Hermathena 6 279-285.
[16] Evgeniou, T., Pontil, M. and Poggio, T. (2000). Regularization networks and support vector machines. Adv. Comput. Math. 13 1-50. · Zbl 0939.68098 · doi:10.1023/A:1018946025316
[17] Fox, J. (2016). Applied Regression and Generalized Linear Models, 3rd ed. Sage, Los Angeles.
[18] Girosi, F. (1998). An equivalence between sparse approximation and support vector machines. Neural Comput. 10 1455-1480. · doi:10.1162/089976698300017269
[19] Hampel, F. R. (1974). The influence curve and its role in robust estimation. J. Amer. Statist. Assoc. 69 383-393. · Zbl 0305.62031
[20] Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986). Robust Statistics: The Approach Based on Influence Functions. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. Wiley, New York. · Zbl 0593.62027
[21] Härdle, W., Hall, P. and Marron, J. S. (1988). How far are automatically chosen regression smoothing parameters from their optimum? J. Amer. Statist. Assoc. 83 86-101. · Zbl 0644.62048
[22] Härdle, W., Hall, P. and Marron, J. S. (1992). Regression smoothing parameters that are not far from their optimum. J. Amer. Statist. Assoc. 87 227-233. · Zbl 0850.62352
[23] Hettmansperger, T. P. and McKean, J. W. (1998). Robust Nonparametric Statistical Methods. Kendall’s Library of Statistics 5. Edward Arnold, London. · Zbl 0887.62056
[24] Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Stat. 35 73-101. · Zbl 0136.39805 · doi:10.1214/aoms/1177703732
[25] Huber, P. J. (1965). A robust version of the probability ratio test. Ann. Math. Stat. 36 1753-1758. · Zbl 0137.12702 · doi:10.1214/aoms/1177699803
[26] Huber, P. J. (1968). Robust confidence limits. Z. Wahrsch. Verw. Gebiete 10 269-278. · Zbl 0174.50402 · doi:10.1007/BF00531848
[27] Huber, P. J. (1981). Robust Statistics. Wiley, New York. · Zbl 0536.62025
[28] Huber, P. J. and Strassen, V. (1973). Minimax tests and the Neyman-Pearson lemma for capacities. Ann. Statist. 1 251-263. · Zbl 0259.62008
[29] Huber, P. J. and Strassen, V. (1974). Correction: “Minimax tests and the Neyman-Pearson lemma for capacities” (Ann. Statist. 1 (1973), 251-263). Ann. Statist. 2 223-224. · Zbl 0269.62020
[30] Hubert, M. (2001). Multivariate outlier detection and robust covariance matrix estimation—discussion. Technometrics 43 303-306.
[31] Hubert, M., Rousseeuw, P. J. and Vanden Branden, K. (2005). ROBPCA: A new approach to robust principal component analysis. Technometrics 47 64-79. · doi:10.1198/004017004000000563
[32] Jurečková, J. and Picek, J. (2006). Robust Statistical Methods with R. CRC Press/CRC, Boca Raton, FL. · Zbl 1097.62020
[33] Kutner, M. H., Nachtsheim, C. J., Neter, J. and Li, W. (2005). Applied Linear Statistical Models, 5th ed. McGraw-Hill, New York.
[34] Leung, D. H.-Y. (2005). Cross-validation in nonparametric regression with outliers. Ann. Statist. 33 2291-2310. · Zbl 1086.62055 · doi:10.1214/009053605000000499
[35] Leung, D. H. Y., Marriott, F. H. C. and Wu, E. K. H. (1993). Bandwidth selection in robust smoothing. J. Nonparametr. Stat. 2 333-339. · Zbl 1360.62132 · doi:10.1080/10485259308832562
[36] Liu, D. (2008). Support Vector Machines. Springer, Berlin.
[37] Maronna, R. A., Martin, R. D. and Yohai, V. J. (2006). Robust Statistics: Theory and Methods. Wiley Series in Probability and Statistics. Wiley, Chichester. · Zbl 1094.62040 · doi:10.1002/0470010940
[38] Mukherjee, S., Niyogi, P., Poggio, T. and Rifkin, R. (2006). Learning theory: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv. Comput. Math. 25 161-193. · Zbl 1099.68693 · doi:10.1007/s10444-004-7634-z
[39] Mukhoty, B., Gopakumar, G., Jain, P. and Kar, P. (2019). Globally-convergent iteratively reweighted least squares for robust regression problems. In Proceedings of Machine Learning Research 89 (AISTATS 2019) 313-322.
[40] Osborne, M. R. (1985). Finite Algorithms in Optimization and Data Analysis. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley, Chichester. · Zbl 0573.65044
[41] Poggio, T. and Smale, S. (2003). The mathematics of learning: Dealing with data. Notices Amer. Math. Soc. 50 537-544. · Zbl 1083.68100
[42] Riazoshams, H., Midi, H. and Ghilagaber, G. (2019). Robust Nonlinear Regression: With Applications Using R. Wiley, Hoboken, NJ. · Zbl 1407.62022
[43] Rousseeuw, P. J. and Leroy, A. M. (2003). Robust Regression and Outlier Detection. Wiley, New York. · doi:10.1002/0471725382
[44] Samorodnitsky, G. and Taqqu, M. S. (1994). Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Stochastic Modeling. CRC Press, New York. · Zbl 0925.60027
[45] Schölkopf, B., Herbrich, R. and Smola, A. J. (2001). A generalized representer theorem. In Computational Learning Theory (Amsterdam, 2001) (D. Helmblod and B. Williamson, eds.). Lecture Notes in Computer Science 2111 416-426. Springer, Berlin. · Zbl 0992.68088 · doi:10.1007/3-540-44581-1_27
[46] Schölkopf, B. and Smola, A. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA.
[47] Sigl, J. (2016). Nonlinear residual minimization by iteratively reweighted least squares. Comput. Optim. Appl. 64 755-792. · Zbl 1373.90153 · doi:10.1007/s10589-016-9829-x
[48] Simpson, D. G., Ruppert, D. and Carroll, R. J. (1992). On one-step GM estimates and stability of inferences in linear regression. J. Amer. Statist. Assoc. 87 439-450. · Zbl 0781.62104
[49] Steinwart, I. (2004). Sparseness of support vector machines. J. Mach. Learn. Res. 4 1071-1105. · Zbl 1094.68082 · doi:10.1162/1532443041827925
[50] Suykens, J. A. K., Van Gestel, T., De Brabanter, J., De Moor, B. and Vandewalle, J. (2002). Least Squares Support Vector Machines. World Scientific, Singapore. · Zbl 1017.93004
[51] Tikhonov, A. N. and Arsenin, V. Y. (1977). Solutions of Ill-Posed Problems. V. H. Winston & Sons, Washington, DC. · Zbl 0354.65028
[52] Tukey, J. W. (1960). A survey of sampling from contaminated distributions. In Contributions to Probability and Statistics 448-485. Stanford Univ. Press, Stanford, CA. · Zbl 0201.52803
[53] Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer, New York. · Zbl 0833.62008 · doi:10.1007/978-1-4757-2440-0
[54] Vapnik, V. N. (1998). Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley, New York. · Zbl 0935.62007
[55] Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59. SIAM, Philadelphia, PA. · Zbl 0813.62001 · doi:10.1137/1.9781611970128
[56] Wahba, G. (1999). Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In Advances in Kernel Methods—Support Vector Learning (B. Schölkopf, C. Burges and A. Smola, eds.) 69-88, Cambridge, MA.
[57] Wasserman, L. (2006). All of Nonparametric Statistics. Springer Texts in Statistics. Springer, New York. · Zbl 1099.62029
[58] Wilcox, R. R. (1996). A review of some recent developments in robust regression. Br. J. Math. Stat. Psychol. 49 253-274. · Zbl 0898.62040 · doi:10.1111/j.2044-8317.1996.tb01088.x
[59] Wilcox, R. (2012). Introduction to Robust Estimation and Hypothesis Testing, 3rd ed. Statistical Modeling and Decision Science. Elsevier/Academic Press, Amsterdam. · Zbl 1270.62051 · doi:10.1016/B978-0-12-386983-8.00001-9
[60] Yang, Y. (2007). Consistency of cross validation for comparing regression procedures. Ann. Statist. 35 2450-2473. · Zbl 1129.62039 · doi:10.1214/009053607000000514
[61] Yu, C. and Yao, W. (2017). Robust linear regression: A review and comparison. Comm. Statist. Simulation Comput. 46 6261-6282. · Zbl 1388.62070 · doi:10.1080/03610918.2016.1202271
[62] Zolotarev, V. M. (1986). One-Dimensional Stable Distributions. Translations of Mathematical Monographs 65. Amer. Math. Soc., Providence, RI · doi:10.1090/mmono/065
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.