×

A high breakdown, high efficiency and bounded influence modified GM estimator based on support vector regression. (English) Zbl 1516.62247

Summary: Regression analysis aims to estimate the approximate relationship between the response variable and the explanatory variables. This can be done using classical methods such as ordinary least squares. Unfortunately, these methods are very sensitive to anomalous points, often called outliers, in the data set. The main contribution of this article is to propose a new version of the Generalized M-estimator that provides good resistance against vertical outliers and bad leverage points. The advantage of this method over the existing methods is that it does not minimize the weight of the good leverage points, and this increases the efficiency of this estimator. To achieve this goal, the fixed parameters support vector regression technique is used to identify and minimize the weight of outliers and bad leverage points. The effectiveness of the proposed estimator is investigated using real and simulated data sets.

MSC:

62-XX Statistics

Software:

AS 132
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] R.D. Armstrong and M.T. Kung, Algorithm AS 132: Least absolute value estimates for a simple linear regression problem, Appl. Stat. 27 (1978), pp. 363-366. doi: 10.2307/2347181 · Zbl 0437.62065 · doi:10.2307/2347181
[2] A.C. Atkinson, M. Riani, and A. Cerioli, Exploring Multivariate Data with the Forward Search, Springer-Verlag, New York, 2004. · Zbl 1049.62057 · doi:10.1007/978-0-387-21840-3
[3] A. Bagheri, H. Midi, M. Ganjali, and S. Eftekhari, A comparison of various influential points diagnostic methods and robust regression approaches: Reanalysis of interstitial lung disease data, Appl. Math. Sci. 4 (2010), pp. 1367-1386. · Zbl 1205.62104
[4] C.W. Coakley and T.P. Hettmansperger, A bounded influence, high breakdown, efficient regression estimator, J. Amer. Statist. Assoc. 88 (1993), pp. 872-880. doi: 10.1080/01621459.1993.10476352 · Zbl 0783.62024
[5] R.D. Cook, Detection of influential observation in linear regression, Technometrics 19 (1977), pp. 15-18. · Zbl 0371.62096
[6] W. Dhhan, S. Rana, and H. Midi, Non-sparse ϵ-insensitive support vector regression for outlier detection, J. Appl. Stat. 42 (2015), pp. 1723-1739. doi: 10.1080/02664763.2015.1005064 · Zbl 1514.62526
[7] D. Gervini and V.J. Yohai, A class of robust and fully efficient regression estimators, Ann. Statist. 30 (2002), pp. 583-616. doi: 10.1214/aos/1021379866 · Zbl 1012.62073 · doi:10.1214/aos/1021379866
[8] J. Groß, Linear Regression, Springer-Verlag, Berlin Heidelberg, New York, 2003. · doi:10.1007/978-3-642-55864-1
[9] F.R. Hampel, The influence curve and its role in robust estimation, J. Amer. Statist. Assoc. 69 (1974), pp. 383-393. doi: 10.1080/01621459.1974.10482962 · Zbl 0305.62031
[10] D.M. Hawkins, Identification of Outliers, Springer, New York, 1980. · Zbl 0438.62022 · doi:10.1007/978-94-015-3994-4
[11] S. Hekimoğlu and R.C. Erenoglu, A new GM-estimate with high breakdown point, Acta Geodaetica Et Geophysica 48 (2013), pp. 419-437. doi: 10.1007/s40328-013-0029-1 · doi:10.1007/s40328-013-0029-1
[12] R.W. Hill, Robust regression when there are outliers in the carriers, unpublished Ph.D. diss., Harvard University, 1977.
[13] P.J. Huber, Robust regression: Asymptotics, conjectures and Monte Carlo, Ann. Statist. 1 (1973), pp. 799-821. doi: 10.1214/aos/1176342503 · Zbl 0289.62033 · doi:10.1214/aos/1176342503
[14] M. Hubert, P.J. Rousseeuw, and S. Van Aelst, High-breakdown robust multivariate methods, Statist. Sci. 23 (2008), pp. 92-119. doi: 10.1214/088342307000000087 · Zbl 1327.62328 · doi:10.1214/088342307000000087
[15] R.A. Maronna, R.D. Martin, and V.J. Yohai, Robust Statistics, Wiley, Chichester, 2006. · Zbl 1094.62040 · doi:10.1002/0470010940
[16] A.H.M. Rahmatullah Imon, Identifying multiple influential observations in linear regression, J. Appl. Stat. 32 (2005), pp. 929-946. doi: 10.1080/02664760500163599 · Zbl 1121.62404
[17] J.L. Rojo-Álvarez, M. Martínez-Ramón, A.R. Figueiras-Vidal, A. García-Armada, and A. Artés-Rodríguez, A robust support vector algorithm for nonparametric spectral analysis, IEEE Signal Process. Lett. 10 (2003), pp. 320-323. doi: 10.1109/LSP.2003.818866 · doi:10.1109/LSP.2003.818866
[18] P.J. Rousseeuw, Least median of squares regression, J. Amer. Statist. Assoc. 79 (1984), pp. 871-880. doi: 10.1080/01621459.1984.10477105 · Zbl 0547.62046
[19] P.J. Rousseeuw, Multivariate estimation with high breakdown point, Math. Statist. Appl. B (1985), pp. 283-297. doi: 10.1007/978-94-009-5438-0_20 · Zbl 0609.62054 · doi:10.1007/978-94-009-5438-0_20
[20] P.J. Rousseeuw and C. Croux, Alternatives to the median absolute deviation, J. Amer. Statist. Assoc. 88 (1993), pp. 1273-1283. doi: 10.1080/01621459.1993.10476408 · Zbl 0792.62025
[21] P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection, J. Wiley & Sons, New York, 1987. · Zbl 0711.62030 · doi:10.1002/0471725382
[22] P. Rousseeuw and V. Yohai, Robust regression by means of S-estimators, in Robust and Nonlinear Time Series Analysis, Lecture Notes in Statistics Vol. 26, J. Franke, W. Hardle, and R.D. Martin, eds., Springer-Verlag, Berlin, Heidelberg, New York, 1984, pp. 256-276.
[23] Y. She and A.B. Owen, Outlier detection using non-convex penalized regression, J. Amer. Statist. Assoc. 106 (2011), pp. 626-639. doi: 10.1198/jasa.2011.tm10390 · Zbl 1232.62068
[24] J.R. Simpson, New methods and comparative evaluations for robust and biased-robust regression estimation, unpublished Ph.D. diss., Department of Industrial and Management Systems Engineering, Arizona State University, 1995.
[25] A.J. Stromberg, O. Hössjer, and D.M. Hawkins, The least trimmed differences regression estimator and alternatives, J. Amer. Statist. Assoc. 95 (2000), pp. 853-864. doi: 10.1080/01621459.2000.10474277 · Zbl 0998.62022
[26] S. Van Aelst and P. Rousseeuw, Minimum volume ellipsoid, WIRES Comput. Statist. 1 (2009), pp. 71-82. doi: 10.1002/wics.19 · doi:10.1002/wics.19
[27] V.J. Yohai, High breakdown-point and high efficiency robust estimates for regression, Ann. Statist. 15 (1987), pp. 642-656. doi: 10.1214/aos/1176350366 · Zbl 0624.62037 · doi:10.1214/aos/1176350366
[28] V.J. Yohai and R.H. Zamar, High breakdown-point estimates of regression by means of the minimization of an efficient scale, J. Amer. Statist. Assoc. 83 (1988), pp. 406-413. doi: 10.1080/01621459.1988.10478611 · Zbl 0648.62036
[29] C. Yu, K. Chen, and W. Yao, Outlier detection and robust mixture modelling using non-convex penalized likelihood, J. Statist. Plann. Inference 164 (2015), pp. 27-38. doi: 10.1016/j.jspi.2015.03.003 · Zbl 1322.62180 · doi:10.1016/j.jspi.2015.03.003
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.