Soft margin support vector classification as buffered probability minimization.

*(English)*Zbl 1434.68440Summary: In this paper, we show that the popular C-SVM, soft-margin support vector classifier is equivalent to minimization of Buffered Probability of Exceedance (bPOE), a recently introduced characterization of uncertainty. To show this, we introduce a new SVM formulation, called the EC-SVM, which is derived from a simple bPOE minimization problem that is easy to interpret with a meaningful free parameter, optimal objective value, and probabilistic derivation. Over the range of its free parameter, the EC-SVM has both a convex and non-convex case which we connect to existing SVM formulations. We first show that the C-SVM, formulated with any regularization norm, is equivalent to the convex EC-SVM. Similarly, we show that the E\(\nu\)-SVM is equivalent to the EC-SVM over its entire parameter range, which includes both the convex and non-convex case. These equivalences, coupled with the interpretability of the EC-SVM, allow us to gain surprising new insights into the C-SVM and fully connect soft margin support
vector classification with superquantile and bPOE concepts. We also show that the EC-SVM can easily be cast as a robust optimization problem, where bPOE is minimized with data lying in a fixed uncertainty set. This reformulation allows us to clearly differentiate between the convex and non-convex case, with convexity associated with pessimistic views of uncertainty and non-convexity associated with optimistic views of uncertainty. Finally, we address some practical considerations. First, we show that these new insights can assist in making parameter selection more efficient. Second, we discuss optimization approaches for solving the EC-SVM. Third, we address the issue of generalization, providing generalization bounds for both bPOE and misclassification rate.

##### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

##### Keywords:

support vector machines; buffered probability of exceedance; conditional value-at-risk; binary classification; robust optimization##### Software:

Pegasos
PDF
BibTeX
XML
Cite

\textit{M. Norton} et al., J. Mach. Learn. Res. 18, Paper No. 68, 43 p. (2017; Zbl 1434.68440)

Full Text:
Link

##### References:

[1] | Jinbo Bi and Tong Zhang. Support vector classification with input data uncertainty. Advances in neural information processing systems, 17:161, 2005. |

[2] | Olivier Bousquet and Andr´e Elisseeff. Stability and generalization. Journal of machine learning research, 2(Mar):499–526, 2002. · Zbl 1007.68083 |

[3] | Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and trends in machine learning, 3(1):1–122, 2011.R · Zbl 1229.90122 |

[4] | Chih-Chung Chang and Chih-Jen Lin. Training v-support vector classifiers: theory and algorithms. Neural computation, 13(9):2119–2147, 2001. · Zbl 0993.68080 |

[5] | Olivier Chapelle. Training a support vector machine in the primal. Neural computation, 19 (5):1155–1178, 2007. · Zbl 1123.68101 |

[6] | Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20(3): 273–297, 1995. · Zbl 0831.68098 |

[7] | Justin Davis and Stan Uryasev. Analysis of hurricane damage using buffered probability of exceedance. Research report 2014-4, ise dept., university of florida, 2014. |

[8] | Tao Pham Dinh and Hoai An Le Thi. Recent advances in dc programming and dca. In Transactions on computational intelligence XIII, pages 1–37. Springer, 2014. |

[9] | Alexander Mafusalov and Stan Uryasev. Buffered probability of exceedance: Mathematical properties and optimization algorithms. Research report 2014-1, ise dept., university of florida, 2015. · Zbl 1395.90191 |

[10] | Matthew Norton and Stan Uryasev. Maximization of auc and buffered auc in binary classification. Research report 2014-2, ise dept., university of florida, 2014. |

[11] | Matthew Norton, Alexander Mafusalov, and Stan Uryasev. Cardinality of upper average and its application to network optimization. Research report 2015-1, ise dept., university of florida, 2015. · Zbl 1391.90407 |

[12] | Fernando P´erez-Cruz, Jason Weston, DJL Herrmann, and Bernhard Sch¨olkopf. Extension of the nu-svm range for classification. Nato science series sub series III computer and system sciences, 190:179–196, 2003. |

[13] | Ralph Tyrell Rockafellar. Safeguarding strategies in risky optimization. Presentation at the international workshop on engineering risk control and optimization, Gainesville, FL,, 2009. |

[14] | Ralph Tyrell Rockafellar. Convex analysis. Princeton university press, 2015. |

[15] | Ralph Tyrell Rockafellar and Johannes Royset. On buffered failure probability in design and optimization of structures. Reliability engineering & system safety, Vol. 95, 499-510, 2010. 42 |

[16] | Ralph Tyrell Rockafellar and Stan Uryasev. Conditional value-at-risk for general loss distributions. Journal of banking & finance, 26(7):1443–1471, 2002. |

[17] | Bernhard Sch¨olkopf, Alex J Smola, Robert C Williamson, and Peter L Bartlett.New support vector algorithms. Neural computation, 12(5):1207–1245, 2000. |

[18] | Bernhard Sch¨olkopf, Ralf Herbrich, and Alex J Smola. A generalized representer theorem. In International conference on computational learning theory, pages 416–426. Springer, 2001. · Zbl 0992.68088 |

[19] | Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, and Andrew Cotter. Pegasos: Primal estimated sub-gradient solver for svm. Mathematical programming, 127(1):3–30, 2011. · Zbl 1211.90239 |

[20] | Akiko Takeda and Masashi Sugiyama. ν-support vector machine as conditional value-at-risk minimization. In Proceedings of the 25th international conference on machine learning, pages 1056–1063. ACM, 2008. |

[21] | Pham Dinh Tao and Le Thi Hoai An. Convex analysis approach to dc programming: theory, algorithms and applications. Acta mathematica vietnamica, 22(1):289–355, 1997. · Zbl 0895.90152 |

[22] | Stan Uryasev. Buffered probability of exceedance and buffered service level: Definitions and properties. Research report 2014-3, ise dept., university of florida, 2014. |

[23] | Ximing Wang, Neng Fan, and Panos M Pardalos. Stochastic subgradient descent method for large-scale robust chance-constrained support vector machines. Optimization letters, pages 1–12, 2016. · Zbl 1373.90090 |

[24] | Huan Xu, Constantine Caramanis, and Shie Mannor. Robustness and regularization of support vector machines. Journal of machine learning research, 10(Jul):1485–1510, 2009. · Zbl 1235.68209 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.