## SVM-Maj: a majorization approach to linear support vector machines with different hinge errors.(English)Zbl 1151.90551

Summary: Support vector machines (SVM) are becoming increasingly popular for the prediction of a binary dependent variable. SVMs perform very well with respect to competing techniques. Often, the solution of an SVM is obtained by switching to the dual. In this paper, we stick to the primal support vector machine problem, study its effective aspects, and propose varieties of convex loss functions such as the standard for SVM with the absolute hinge error as well as the quadratic hinge and the Huber hinge errors. We present an iterative majorization algorithm that minimizes each of the adaptations. In addition, we show that many of the features of an SVM are also obtained by an optimal scaling approach to regression. We illustrate this with an example from the literature and do a comparison of different methods on several empirical data sets.

### MSC:

 90C30 Nonlinear programming 62H30 Classification and discrimination; cluster analysis (statistical aspects) 68T05 Learning and adaptive systems in artificial intelligence

### Software:

SVMlight; UCI-ml; LIBSVM
Full Text:

### References:

 [1] Borg I, Groenen PJF (2005) Modern multidimensional scaling: theory and applications, 2nd edn. Springer, New York · Zbl 1085.62079 [2] Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Knowl Discov Data Min 2: 121–167 · Zbl 05470543 [3] Chang C-C, Lin C-J (2006) LIBSVM: a library for support vector machines (Software available at http://www.csie.ntu.edu.tw/$$\sim$$cjlin/libsvm ) [4] Chu W, Keerthi S, Ong C (2003) Bayesian trigonometric support vector classifier. Neural Comput 15(9): 2227–2254 · Zbl 1085.68620 [5] Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge · Zbl 0994.68074 [6] De Leeuw J (1994) Block relaxation algorithms in statistics. In: Bock H-H, Lenski W, Richter MM(eds) Information systems and data analysis. Springer, Berlin, pp 308–324 · Zbl 0829.65144 [7] Gifi A (1990) Nonlinear multivariate analysis. Wiley, Chichester · Zbl 0697.62048 [8] Groenen PJF, Nalbantov G, Bioch JC (2007) Nonlinear support vector machines through iterative majorization. In: Decker R, Lenz H-J(eds) Advances in data analysis. Springer, Berlin, pp 149–162 [9] Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York · Zbl 0973.62007 [10] Heiser WJ (1995) Convergent computation by iterative majorization: theory and applications in multidimensional data analysis. In: Krzanowski WJ(eds) Recent advances in descriptive multivariate analysis. Oxford University Press, Oxford, pp 157–189 [11] Hsu C-W, Lin C-J (2006) BSVM: bound-constrained support vector machines (Software available at http://www.csie.ntu.edu.tw/$$\sim$$cjlin/bsvm/index.html ) [12] Huber PJ (1981) Robust statistics. Wiley, New York · Zbl 0536.62025 [13] Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 39: 30–37 · Zbl 05680564 [14] Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods–support vector learning. MIT-Press, Cambridge ( http://www-ai.cs.uni-dortmund.de/DOKUMENTE/joachims_99a.pdf ) [15] Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of the ACM conference on knowledge discovery and data mining (KDD) ( http://www.cs.cornell.edu/People/tj/publications/joachims_06a.pdf ) [16] Kiers HAL (2002) Setting up alternating least squares and iterative majorization algorithms for solving various matrix optimization problems. Comput Stat Data Anal 41: 157–170 · Zbl 1018.65074 [17] Kruskal JB (1965) The analysis of factorial experiments by estimating monotone transformations of the data. J R Stat Soc Ser B 27: 251–263 [18] Lange K, Hunter DR, Yang I (2000) Optimization transfer using surrogate objective functions. J Comput Graph Stat 9: 1–20 [19] Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases ( http://www.ics.uci.edu/$$\sim$$mlearn/MLRepository.html University of California, Irvine, Department of Information and Computer Sciences) [20] Rosset S, Zhu J (2007) Piecewise linear regularized solution paths. Ann Stat 35: 1012–1030 · Zbl 1194.62094 [21] Rousseeuw PJ, Leroy AM (2003) Robust regression and outlier detection. Wiley, New York [22] Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific, Singapore · Zbl 1017.93004 [23] Van der Kooij AJ (2007) Prediction accuracy and stability of regression with optimal scaling transformations. Unpublished doctoral dissertation, Leiden University [24] Van der Kooij AJ, Meulman JJ, Heiser WJ (2006) Local minima in categorical multiple regression. Comput Stat Data Anal 50: 446–462 · Zbl 1302.62152 [25] Vapnik VN (2000) The nature of statistical learning theory. Springer, New York · Zbl 0934.62009 [26] Young FW (1981) Quantitative analysis of qualitative data. Psychometrika 46: 357–388 · Zbl 0479.62003 [27] Young FW, De Leeuw J, Takane Y (1976) Additive structure in qualitative data: an alternating least squares method with optimal scaling features. Psychometrika 41: 471–503 · Zbl 0351.92032 [28] Young FW, De Leeuw J, Takane Y (1976) Regression with qualitative and quantitative variables: an alternating least squares method with optimal scaling features. Psychometrika 41: 505–529 · Zbl 0351.92032
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.