×

Arbitrary norm support vector machines. (English) Zbl 1178.68419

Summary: Support Vector Machines (SVM) are state-of-the-art classifiers. Typically L\(_{2}\)-norm or L\(_{1}\)-norm is adopted as a regularization term in SVMs, while other norm-based SVMs, for example, the L\(_{0}\)-norm SVM or even the L\(_{\infty }\)-norm SVM, are rarely seen in the literature. The major reason is that L\(_{0}\)-norm describes a discontinuous and nonconvex term, leading to a combinatorially NP-hard optimization problem. In this letter, motivated by Bayesian learning, we propose a novel framework that can implement arbitrary norm-based SVMs in polynomial time. One significant feature of this framework is that only a sequence of sequential minimal optimization problems needs to be solved, thus making it practical in many real applications.
The proposed framework is important in the sense that Bayesian priors can be efficiently plugged into most learning methods without knowing the explicit form. Hence, this builds a connection between Bayesian learning and the kernel machines. We derive the theoretical framework, demonstrate how our approach works on the L\(_{0}\)-norm SVM as a typical example, and perform a series of experiments to validate its advantages. Experimental results on nine benchmark data sets are very encouraging. The implemented L\(_{0}\)-norm is competitive with or even better than the standard L\(_{2}\)-norm SVM in terms of accuracy but with a reduced number of support vectors, - 9.46% of the number on average. When compared with another sparse model, the relevance vector machine, our proposed algorithm also demonstrates better sparse properties with a training speed over seven times faster.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Burges C., Proceedings of the 13th International Conference on Machine Learning pp 71– (1996)
[2] Dempster A., Journal of the Royal Statistical Society, Series B 39 (1) pp 1– (1977)
[3] DOI: 10.1162/15324430260185637 · Zbl 1037.68111 · doi:10.1162/15324430260185637
[4] Figueiredo M., Advances in neural information processing systems 14 (2002)
[5] Figueiredo M., Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’ 2000) (2000)
[6] DOI: 10.1023/A:1012474916001 · Zbl 0998.68107 · doi:10.1023/A:1012474916001
[7] Fung G. M., Journal of Machine Learning Research 3 pp 303– (2002)
[8] DOI: 10.1080/00401706.1970.10488634 · doi:10.1080/00401706.1970.10488634
[9] DOI: 10.1109/TNN.2007.905855 · doi:10.1109/TNN.2007.905855
[10] Huang K., Journal of Machine Learning Research 5 pp 1253– (2004)
[11] DOI: 10.1162/089976601300014493 · Zbl 1085.68629 · doi:10.1162/089976601300014493
[12] Li Y., Neural Information Processing: Letters and Reviews 10 pp 11– (2006)
[13] Neal R., Learning in graphical models pp 355– (1999)
[14] DOI: 10.1145/1102351.1102429 · doi:10.1145/1102351.1102429
[15] DOI: 10.1016/S0042-6989(97)00169-7 · doi:10.1016/S0042-6989(97)00169-7
[16] Saunders C., Proceedings of the 15th International Conference on Machine Learning pp 515– (1998)
[17] DOI: 10.1109/72.788641 · doi:10.1109/72.788641
[18] Tipping M., Advances in neural information processing systems 12 (2000)
[19] DOI: 10.1162/15324430152748236 · Zbl 0997.68109 · doi:10.1162/15324430152748236
[20] DOI: 10.1007/978-1-4757-3264-1 · doi:10.1007/978-1-4757-3264-1
[21] DOI: 10.1162/153244303322753751 · Zbl 1102.68605 · doi:10.1162/153244303322753751
[22] Zhu J., Advances in neural information processing systems 16 (2003)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.