zbMATH — the first resource for mathematics

Multicategory proximal support vector machine classifiers. (English) Zbl 1101.68758
Summary: Given a dataset, each element of which labeled by one of \(k\) labels, we construct by a very fast algorithm, a \(k\)-category proximal support vector machine (PSVM) classifier. Proximal support vector machines and related approaches can be interpreted as ridge regression applied to classification problems. Extensive computational results have shown the effectiveness of PSVM for two-class classification problems where the separating plane is constructed in time that can be as little as two orders of magnitude shorter than that of conventional support vector machines. When PSVM is applied to problems with more than two classes, the well known one-from-the-rest approach is a natural choice in order to take advantage of its fast performance. However, there is a drawback associated with this one-from-the-rest approach. The resulting two-class problems are often very unbalanced, leading in some cases to poor performance. We propose balancing the \(k\) classes and a novel Newton refinement modification to PSVM in order to deal with this problem. Computational results indicate that these two modifications preserve the speed of PSVM while often leading to significant test set improvement over a plain PSVM one-from-the-rest application. The modified approach is considerably faster than other one-from-the-rest methods that use conventional SVM formulations, while still giving comparable test set correctness.

68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI
[1] Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., & Sorensen, D. (1999). LAPACK User’s Guide, third edition. Philadelphia, Pennsylvania: SIAM. http://www.netlib.org/lapack/. · Zbl 0934.65030
[2] Bennett, K. P. & Mangasarian, O. L. (1993). Multicategory separation via linear programming. Optimization Methods and Software, 3, 27–39. · doi:10.1080/10556789408805554
[3] Bottou, L., Cortes, C., Denker, J., Drucker, H., Guyon, I., Jackel, L., LeCun, Y., Muller, U., Sackinger, E., Simard, P., & Vapnik, V. (1994). Comparison of classifier methods: A case study in handwriting digit recognition. International Conference on Pattern Recognition (pp. 77–87). IEEE Computer Society Press.
[4] Bradley, P. S. & Mangasarian, O. L. (2000). Massive data discrimination via linear support vector machines. Optimization Methods and Software, 13, 1–10. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-03.ps. · Zbl 0986.90085 · doi:10.1080/10556780008805771
[5] Bredensteiner, E. J. & Bennett, K. P. (1999). Multicategory classification by support vector machines. Computational Optimization and Applications, 12, 53–79. · Zbl 1040.90574 · doi:10.1023/A:1008663629662
[6] Burges, C. J. C. (1998).Atutorial on support vectormachines for pattern recognition. Data Mining and Knowledge Discovery, 2:2, 121–167. · Zbl 05470543 · doi:10.1023/A:1009715923555
[7] Chen, C. & Mangasarian, O. L. (1996). Hybrid misclassification minimization. Advances in Computational Mathematics, 5:2, 127–136. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/95-05.ps. · Zbl 0857.68088 · doi:10.1007/BF02124738
[8] Cherkassky, V. & Mulier, F. (1998). Learning from Data–Concepts, Theory and Methods. New York: JohnWiley & Sons. · Zbl 0960.62002
[9] CPLEX Optimization Inc., Incline Village, Nevada. (1992). Using the CPLEX(TM) Linear Optimizer and CPLEX(TM) Mixed Integer Optimizer (Version 2.0).
[10] Dasgupta, S. (2000). Experiments with random projection. Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference (UAI-2000) (pp. 143–151). San Francisco, CA: Morgan Kaufmann Publishers.
[11] Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 171–203). Cambridge, MA: MIT Press. · Zbl 0939.68098
[12] Facchinei, F. (1995). Minimization of SC1 functions and the Maratos effect. Operations Research Letters, 17, 131–137. · Zbl 0843.90108 · doi:10.1016/0167-6377(94)00059-F
[13] Fung, G. & Mangasarian, O. L. (2001). Proximal support vector machine classifiers. In F. Provost & R. Srikant (Eds.), Proceedings KDD-2001: Knowledge discovery and data mining (pp. 77–86). San Francisco, CA, New York: Asscociation for Computing Machinery. ftp://ftp.cs.wisc.edu/pub/ dmi/tech-reports/01-02.ps. · Zbl 1101.68758
[14] Furey, T. S., Duffy, N., Cristianini, N., Bednarski, D., Schummer, M., & Haussler, D. (2000). Support vector machine classification and validation of cnacer tissue samples usingmicroarray expression data. Bioinformatics, 16:10, 906–914. · doi:10.1093/bioinformatics/16.10.906
[15] Van Gestel, T., Suykens, J., Lanckriet, G., Lambrechts, A., De Moor, B., & Vandewalle, J. (2002). Multiclass ls-svms: moderated outputs and coding-decoding schemes. Neural Processing Letters, 15:1, 45–48. · Zbl 1008.68739 · doi:10.1023/A:1013815310229
[16] Hiriart-Urruty, J.-B., Strodiot, J. J., & Nguyen, V. H. (1984). Generalized hessian matrix and second-order optimality conditions for problems with CL1 data. Applied Mathematics and Optimization, 11, 43–56. · Zbl 0542.49011 · doi:10.1007/BF01442169
[17] Hoerl, A. E. & Kennard, R.W. (1952). Biased estimation for nonorthogonal problems. Technometrics, 12, 55–67. · Zbl 0202.17205 · doi:10.2307/1267351
[18] Hsu, C.-W. & Lin, C.-J. (2001). A comparison on methods for multi-class support vector machines. http://www.csie.ntu.edu.tw/cjlin/papers.html.
[19] Kanzow, C., Qi, H., & Qi, L. (2001). On the minimum norm solution of linear programs. Preprint, University of Hamburg, Hamburg. Journal of Optimization Theory and Applications, to appear. http://www.math.uni-hamburg.de/home/kanzow/paper.html. · Zbl 1043.90046
[20] Lee, Y.-J. & Mangasarian, O. L. (2001). RSVM: Reduced support vector machines. Proceedings of the First SIAM International Conference on Data Mining, Chicago, April 5-7, CD-ROM. ftp://ftp.cs.wisc.edu/ pub/dmi/tech-reports/00-07.ps.
[21] Lee, Y.-J. & Mangasarian, O. L. (2001). SSVM: A smooth support vector machine. Computational Optimization and Applications, 20, 5–22. Data Mining Institute, University of Wisconsin, Technical Report 99-03. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-03.ps. · Zbl 1017.90105 · doi:10.1023/A:1011215321374
[22] Mangasarian, O. L. (1994). Nonlinear Programming. Philadelphia, PA: SIAM. · Zbl 0833.90108
[23] Mangasarian, O. L. (2000). Generalized support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 135–146). Cambridge, MA: MIT Press. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-14.ps.
[24] MATLAB. (1994–2001). User’s Guide. The MathWorks, Inc., Natick, MA 01760. http://www.mathworks.com.
[25] Murphy, P. M. & Aha, D. W. (1992). UCI machine learning repository. http://www.ics.uci.edu/\(\sim\)mlearn/MLRepository.html.
[26] Polyak, B. T. (1987). Introduction to Optimization. New York: Optimization Software, Inc., Publications Division. · Zbl 0708.90083
[27] Rockafellar, R. T. (1970). Convex Analysis Princeton. New Jersey: Princeton University Press. · Zbl 0193.18401
[28] Roth, V. V. & Steinhage, V. (1999). Nonlinear discriminant analysis using kernel function. In S. A. Solla, T. K. Leen, & K.-R. Mueller (Eds.), Advances in neural information processing systems–NIPS*99 (pp. 568–574).
[29] Schölkopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Müller, K.-R., Rätsch, G., & Smola, A. J. (1999). Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10, 1000–1017. · doi:10.1109/72.788641
[30] Smola, A. J. & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. Proc. 17th International Conf. on Machine Learning (pp. 911–918). San Francisco, CA: Morgan Kaufmann.
[31] Suykens, J. A. K., Van Gestel, T., De Brabanter, J., De Moor, B., & Vandewalle, J. (2002). Least Squares Support Vector Machines. Singapore: World Scientific Publishing Co.. · Zbl 1017.93004
[32] Suykens, J. A. K., Lukas, L., Van Dooren, P., De Moor, B., & Vandewalle, J. (1999). Least squares support vector machine classifiers: A large scale algorithm. European Conference on Circuit Theory and Design, ECCTD’99 (pp. 839–842). Stresa, Italy.
[33] Suykens, J. A. K. & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9:3, 293–300. · Zbl 05467879 · doi:10.1023/A:1018628609742
[34] Suykens, J. A. K. & Vandewalle, J. (1999). Multiclass least squares support vector machines. Proceedings of IJCNN’99 (pp. CD-ROM). Washington, DC. · Zbl 0958.93042
[35] Tikhonov, A. N. & Arsenin, V. Y. (1977). Solutions of Ill–Posed Problems. New York: John Wiley & Sons. · Zbl 0354.65028
[36] Vapnik, V. N. (2000). The Nature of Statistical Learning Theory. New York: Springer. · Zbl 0934.62009
[37] Weston, J. & Watkins, C. (1998). Multi-class support vector machines. Technical report csd-tr-98-04, Royal Holloway, University of London, Surrey, England.
[38] Williams, C. K. I. & Seeger, M. (2000). Using the Nyström method to speed up kernel machines. Advances in Neural Information Processing Systems (NIPS2000). http://www.kernel-machines.org.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.