zbMATH — the first resource for mathematics

Robust multicategory support vector machines using difference convex algorithm. (English) Zbl 1397.90319
Summary: The support vector machine (SVM) is one of the most popular classification methods in the machine learning literature. Binary SVM methods have been extensively studied, and have achieved many successes in various disciplines. However, generalization to multicategory SVM (MSVM) methods can be very challenging. Many existing methods estimate \(k\) functions for \(k\) classes with an explicit sum-to-zero constraint. It was shown recently that such a formulation can be suboptimal. Moreover, many existing MSVMs are not Fisher consistent, or do not take into account the effect of outliers. In this paper, we focus on classification in the angle-based framework, which is free of the explicit sum-to-zero constraint, hence more efficient, and propose two robust MSVM methods using truncated hinge loss functions. We show that our new classifiers can enjoy Fisher consistency, and simultaneously alleviate the impact of outliers to achieve more stable classification performance. To implement our proposed classifiers, we employ the difference convex algorithm for efficient computation. Theoretical and numerical results obtained indicate that for problems with potential outliers, our robust angle-based MSVMs can be very competitive among existing methods.

90C26 Nonconvex programming, global optimization
62H30 Classification and discrimination; cluster analysis (statistical aspects)
gss; Orange; UCI-ml
Full Text: DOI
[1] Arora, S., Bhattacharjee, D., Nasipuri, M., Malik, L., Kundu, M., Basu, D.K.: Performance Comparison of SVM and ANN for Handwritten Devnagari Character Recognition. arXiv preprint arXiv:1006.5902 (2010)
[2] Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml (2013) · Zbl 1126.68070
[3] Bartlett, PL; Mendelson, S, Rademacher and Gaussian complexities: risk bounds and structural results, J. Mach. Learn. Res., 3, 463-482, (2002) · Zbl 1084.68549
[4] Bartlett, PL; Bousquet, O; Mendelson, S, Local Rademacher complexities, Ann. Stat., 33, 1497-1537, (2005) · Zbl 1083.62034
[5] Bartlett, PL; Jordan, MI; McAuliffe, JD, Convexity, classification, and risk bounds, J. Am. Stat. Assoc., 101, 138-156, (2006) · Zbl 1118.62330
[6] Boser, BE; Guyon, IM; Vapnik, VN; Haussler, D (ed.), A training algorithm for optimal margin classifiers, 144-152, (1992), New York
[7] Caruana, R., Karampatziakis, N., Yessenalina, A.: An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the 25th International Conference on Machine Learning, pp. 96-103. ACM (2008) · Zbl 1359.90106
[8] Cortes, C; Vapnik, VN, Support vector networks, Mach. Learn., 20, 273-297, (1995) · Zbl 0831.68098
[9] Crammer, K; Singer, Y, On the algorithmic implementation of multiclass kernel-based vector machines, J. Mach. Learn. Res., 2, 265-292, (2001) · Zbl 1037.68110
[10] Cristianini, N., Shawe-Taylor, J.S.: An Introduction to Support Vector Machines, 1st edn. Cambridge University Press, Cambridge (2000) · Zbl 0994.68074
[11] Demšar, J., Curk, T., Erjavec, A., Črt Gorup, Hočevar, T., Milutinovič, M., Možina, M., Polajnar, M., Toplak, M., Starič, A., Štajdohar, M., Umek, L., Žagar, L., Žbontar, J., Žitnik, M., Zupan, B.: Orange: data mining toolbox in python. J. Mach. Learn. Res. 14:2349-2353. http://jmlr.org/papers/v14/demsar13a.html (2013) · Zbl 1319.62146
[12] Freund, Y; Schapire, RE, A desicion-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci., 55, 119-139, (1997) · Zbl 0880.68103
[13] Guermeur, Y; Monfrini, E, A quadratic loss multi-class SVM for which a radius-margin bound applies, Informatica, 22, 73-96, (2011) · Zbl 1263.68132
[14] Hastie, T.J., Tibshirani, R.J., Friedman, J.H.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009) · Zbl 1273.62005
[15] Hsieh, C., Chang, K., Lin, C., Keerthi, S., Sundarajan, S.: A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning, Proceeding ICML ’08, pp. 408-415 (2008) · Zbl 0905.90131
[16] Justino, EJR; Bortolozzi, F; Sabourin, R, A comparison of SVM and HMM classifiers in the off-line signature verification, Pattern Recognit. Lett., 26, 1377-1385, (2005)
[17] Kiwiel, K; Rosa, C; Ruszczynski, A, Proximal decomposition via alternating linearization, SIAM J. Optim., 9, 668-689, (1999) · Zbl 0958.65068
[18] Koltchinskii, V, Local Rademacher complexities and oracle inequalities in risk minimization, Ann. Stat., 34, 2593-2656, (2006) · Zbl 1118.62065
[19] Koltchinskii, V; Panchenko, D, Empirical margin distributions and bounding the generalization error of combined classifiers, Ann. Stat., 30, 1-50, (2002) · Zbl 1012.62004
[20] Thi, HA; Pham Dinh, T, Solving a class of linearly constrained indefinite quadratic problems by DC algorithms, J. Glob. Optim., 11, 253-285, (1997) · Zbl 0905.90131
[21] Thi, HA; Pham Dinh, T, The DC (difference of convex functions) programming and DCA revisited with dc models of real world nonconvex optimization problems, Ann. Oper. Res., 133, 23-46, (2005) · Zbl 1116.90122
[22] Le Thi, H.A., Pham Dinh, T.: The State of the Art in DC Programming and DCA. Research Report (60 pages), Lorraine University (2013) · Zbl 1263.68132
[23] Thi, HA; Pham Dinh, T, Recent advances in DC programming and DCA, Trans. Comput. Collect. Intell., 8342, 1-37, (2014)
[24] Thi, HA; Le, HM; Pham Dinh, T, A dc programming approach for feature selection in support vector machines learning, Adv. Data Anal. Classif., 2, 259-278, (2008) · Zbl 1284.90057
[25] Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: DC programming and DCA for general DC programs. Adv. Intell. Syst. Comput. 15-35. ISBN 978-3-319-06568-7 (2014) · Zbl 1322.90072
[26] Lee, Y; Lin, Y; Wahba, G, Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data, J. Am. Stat. Assoc., 99, 67-81, (2004) · Zbl 1089.62511
[27] Lin, X; Wahba, G; Xiang, D; Gao, F; Klein, R; Klein, B, Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV, Ann. Stat., 28, 1570-1600, (2000) · Zbl 1105.62358
[28] Lin, X; Pham, M; Ruszczynski, A, Alternating linearization for structured regularization problem, J. Mach. Learn. Res., 15, 3447-3481, (2014) · Zbl 1319.62146
[29] Lin, Y.: Some Asymptotic Properties of the Support Vector Machine. Technical Report 1044r, Department of Statistics, University of Wisconsin, Madison (1999)
[30] Liu Y (2007) Fisher consistency of multicategory support vector machines. In: Eleventh International Conference on Artificial Intelligence and Statistics, pp. 289-296 · Zbl 1118.62065
[31] Liu, Y; Shen, X, Multicategory \(ψ \)-learning, J. Am. Stat. Assoc., 101, 500-509, (2006) · Zbl 1119.62341
[32] Liu, Y; Yuan, M, Reinforced multicategory support vector machines, J. Comput. Gr. Stat., 20, 901-919, (2011)
[33] Liu, Y; Zhang, HH; Wu, Y, Soft or hard classification? large margin unified machines, J. Am. Stat. Assoc., 106, 166-177, (2011) · Zbl 1396.62144
[34] McDiarmid, C.: On the method of bounded differences. In: Surveys in Combinatorics, Cambridge University Press, Cambridge, pp. 148-188 (1989) · Zbl 0712.05012
[35] Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. MIT Press, Cambridge, MA (2012) · Zbl 1318.68003
[36] Nesterov, Y, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim., 22, 341-362, (2012) · Zbl 1257.90073
[37] Pang, JS; Razaviyayn, M; Alvarado, A, Computing B-stationary points of nonsmooth DC programs, Math. Oper. Res., 42, 95-118, (2016) · Zbl 1359.90106
[38] Platt, J; Schölkopf, B (ed.); Burges, JC (ed.); Smola, AJ (ed.), Fast training of support vector machines using sequential minimal optimization, 185-208, (1999), USA
[39] Shawe-Taylor, J.S., Cristianini, N.: Kernel Methods for Pattern Analysis, 1st edn. Cambridge University Press, Cambridge (2004) · Zbl 0994.68074
[40] Steinwart, I; Scovel, C, Fast rates for support vector machines using Gaussian kernels, Ann. Stat., 35, 575-607, (2007) · Zbl 1127.68091
[41] Tseng, P, A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training, J. Comput. Optim. Appl., 47, 179-206, (2010) · Zbl 1226.90062
[42] van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes with Application to Statistics, 1st edn. Springer, Berlin, New York, NY (2000) · Zbl 0862.60002
[43] Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998) · Zbl 0935.62007
[44] Wahba, G; Schölkopf, B (ed.); Burges, JC (ed.); Smola, AJ (ed.), Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV, 69-88, (1999), USA
[45] Wang, L; Shen, X, On \(L_1\)-norm multi-class support vector machines: methodology and theory, J. Am. Stat. Assoc., 102, 595-602, (2007) · Zbl 1172.62322
[46] Wang, L; Zhu, J; Zou, H, The doubly regularized support vector machine, Stat. Sin., 16, 589-615, (2006) · Zbl 1126.68070
[47] Wu, Y., Liu, Y.: On multicategory truncated-hinge-loss support vector. In: Prediction and Discovery: AMS-IMS-SIAM Joint Summer Research Conference, Machine and Statistical Learning: Prediction and Discovery, June 25-29, 2006, Snowbird, Utah, American Mathematical Society, vol. 443, pp. 49-58 (2006)
[48] Wu, Y; Liu, Y, Robust truncated hinge loss support vector machines, J. Am. Stat. Assoc., 102, 974-983, (2007) · Zbl 05564425
[49] Zhang, C; Liu, Y, Multicategory angle-based large-margin classification, Biometrika, 101, 625-640, (2014) · Zbl 1335.62110
[50] Zhang, C; Liu, Y; Wang, J; Zhu, H, Reinforced angle-based multicategory support vector machines, J. Comput. Gr. Stat., 25, 806-825, (2016)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.