×

A majorization-minimization scheme for \(L_2\) support vector regression. (English) Zbl 07497095

Summary: In a support vector regression (SVR) model, using the squared \(\epsilon\)-insensitive loss function makes the optimization problem strictly convex and yields a more concise solution. However, the formulation of \(L_2\)-SVR leads to a quadratic programming which is expensive to solve. This paper reformulates the optimization problem of \(L_2\)-SVR by absorbing the constraints in the objective function, which can be solved efficiently by a majorization-minimization approach, in which an upper bound for the objective function is derived in each iteration which is easier to be minimized. The proposed approach is easy to implement, without requiring any additional computing package other than basic linear algebra operations. Numerical studies on real-world datasets show that, compared to the alternatives, the proposed approach can achieve similar prediction accuracy with substantially higher time efficiency in training.

MSC:

62J02 General nonlinear regression
62J12 Generalized linear models (logistic models)
62-XX Statistics

Software:

SVMlight; LIBSVM
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Smola, AJ; Schölkopf, B., A tutorial on support vector regression, Stat Comput, 14, 3, 199-222 (2004)
[2] Vapnik, V., Statistical learning theory (1998), New York: John Wiley, New York · Zbl 0935.62007
[3] Mangasarian, OL; Musicant, DR., ϵ-SSVR: a smooth support vector machine for ϵ-insensitive regression, J Mach Learn Res, 1, 161-177 (2005) · Zbl 0997.68108
[4] Balasundaram, S.; Kapil, On Lagrangian support vector regression, Expert Syst Appl, 37, 12, 8784-8792 (2010)
[5] Lee, YJ; Hsieh, WF; Huang, CM., ϵ-SSVR: a smooth support vector machine for ϵ-insensitive regression, IEEE Trans Knowl Data Eng, 17, 5, 678-685 (2005)
[6] Musicant, DR; Feinberg, A., Active set support vector regression, IEEE Trans Neural Netw, 15, 2, 268-275 (2004)
[7] Chang, CC; Lin, CJ., LIBSVM: a library for support vector machines, ACM Trans Intell Syst Technol, 2, 27, 1-27 (2011)
[8] Gunn, SR. Support vector machines for classification and regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton; 1997.
[9] Joachims, J. Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A, editors. Advances in kernel methods – support vector learning. Cambridge, MA: MIT-Press; 1999.
[10] Osuna, E, Freund, R, Girosi, F. An improved training algorithm for support vector machines. Amelia Island, FL: IEEE Workshop Neural Netw Signal Process; 1997a. p. 276-285.
[11] Osuna, E, Freund, R, Girosi, F. Training support vector machines: an application to face detection. Proceedings of IEEE CVPR. San Juan, Puerto Rico; 1997b.
[12] Platt, J. Fast training of support vector machines using sequential minimal optimization. In: Schöelkopf B, Burges C, Smola A, editors. Advances in kernel methods – support vector learning. Cambridge, MA: MIT Press; 1998.
[13] Hunter, DR; Lange, K., A tutorial on MM algorithms, Am Stat, 58, 30-37 (2004)
[14] Koenker, R.; Bassett, G., Regression quantiles, Econometrica, 46, 1, 33-50 (1978) · Zbl 0373.62038
[15] Efron, B., Regression percentiles using asymmetric squared error loss, Stat Sin, 55, 93-125 (1991) · Zbl 0822.62054
[16] Newey, N.; Powell, JL., Asymmetric least squares estimation and testing, Econometrica, 55, 4, 819-847 (1987) · Zbl 0625.62047
[17] Huang, X.; Shi, L.; Suykens, J., Asymmetric least squares support vector machine classifiers, Comput Stat Data Anal, 70, 395-405 (2014) · Zbl 1471.62092
[18] Sun, Y.; Babu, P.; Palomar, DP., Majorization-minimization algorithms in signal processing, communications, and machine learning, IEEE Trans Signal Process, 65, 3, 794-816 (2017) · Zbl 1414.94595
[19] Hunter, DR; Lange, K., Quantile regression via an MM algorithm, J Comput Graph Stat, 9, 1, 60-77 (2000)
[20] Yang, Y.; Zhang, T.; Zou, H., Flexible expectile regression in reproducing kernel Hilbert space, Technometrics, 60, 1, 26-35 (2018)
[21] Nguyen, HD., An introduction to MM algorithms for machine learning and statistical estimation, WIREs Data Min Knowl Discov, 7, 2, e1198 (2017)
[22] Kimeldorf, GS; Wahba, G., A correspondence between Bayesian estimation on stochastic processes and smoothing by splines, Ann Math Stat, 41, 2, 495-502 (1970) · Zbl 0193.45201
[23] Schölkopf, B.; Smola, AJ., Learning with kernels: support vector machines, regularization, optimization, and beyond (2002), Cambridge, MA: MIT Press
[24] Han, X.; Clemmensen, L., On weighted support vector regression, Qual Reliab Eng Int, 30, 6, 891-903 (2014)
[25] De Brabanter, K, Pelckmans, K, De Brabanter, J, et al. Robustness of kernel based regression: a comparison of iterative weighting schemes. Proceedings of the 19th International Conference on Artificial Neural Networks (ICANN), Limassol, Cyprus; 2009. p. 100-110.
[26] Suykens, JAK; De Brabanter, J.; Lukas, L., Weighted least squares support vector machines: robustness and sparse approximation, Neurocomputing, 48, 1-4, 85-105 (2002) · Zbl 1006.68799
[27] Ho, CH; Lin, CJ., Large-scale linear support vector regression, J Mach Learn Res, 13, 3323-3348 (2012) · Zbl 1433.68349
[28] Mangasarian, OL; Musicant, DR., Large scale kernel regression via linear programming, Mach Learn, 46, 1-3, 255-269 (2002) · Zbl 0998.68106
[29] Boyd, S.; Parikh, N.; Chu, E., Distributed optimization and statistical learning via the alternating direction method of multipliers, Found Trends Mach Learn, 3, 1, 1122 (2011)
[30] Pietrosanu, M.; Gao, J.; Kong, L., Advanced algorithms for penalized quantile and composite quantile regression, Comput Stat (2020) · Zbl 1505.62320 · doi:10.1007/s00180-020-01010-1
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.