×

Employing different loss functions for the classification of images via supervised learning. (English) Zbl 1305.49044

In this paper, the support vector machines approach is applied to an image classification task. To this end, the authors reformulate the general Tikhonov regularization problem, to which the supervised learning problem gives rise, as a convex (not necessarily differentiable) optimization problem. They construct a conjugate dual to it, prove under suitable qualification conditions the existence of strong duality and express the optimal solutions of the primal problem via the ones of the dual. This has as a consequence the formulation of the decision function to be learned by means of the optimal solutions of the dual. Hence for the specific learning task one only has to numerically solve the dual problem, which, different to the primal one, can be mainly formulated as a convex differentiable optimization problem.

MSC:

49N15 Duality theory (optimization)
47A52 Linear operators and ill-posed problems, regularization
90C25 Convex programming
62H30 Classification and discrimination; cluster analysis (statistical aspects)
93E35 Stochastic learning and adaptive control
68T05 Learning and adaptive systems in artificial intelligence
68U10 Computing methodologies for image processing

Software:

Matlab; D-ADMM
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aronszajn N., Theory of reproducing kernels, Trans. Amer. Math. Soc., 1950, 68, 337-404 http://dx.doi.org/10.1090/S0002-9947-1950-0051437-7; · Zbl 0037.20701
[2] Boţ R.I., Conjugate Duality in Convex Optimization, Lecture Notes in Econom. and Math. Systems, 637, Springer, Berlin-Heidelberg, 2010; · Zbl 1190.90002
[3] Boţ R.I., Csetnek E.R., Heinrich A., On the convergence rate improvement of a primal-dual splitting algorithm for solving monotone inclusion problems, SIAM J. Optim., 2013, 23(4), 2011-2036 http://dx.doi.org/10.1137/12088255X; · Zbl 1314.47102
[4] Boţ R.I., Lorenz N., Optimization problems in statistical learning: Duality and optimality conditions, European J. Oper. Res., 2011, 213(2), 395-404 http://dx.doi.org/10.1016/j.ejor.2011.03.021; · Zbl 1222.90079
[5] Boyd S., Parikh N., Chu E., Peleato B., Eckstein J., Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trendsc in Machine Learning, 2010, 3(1), 1-122 http://dx.doi.org/10.1561/2200000016; · Zbl 1229.90122
[6] Chapelle O., Haffner P., Vapnik V.N., Support vector machines for histogramm-based image classification, IEEE Trans. Neural Networks, 1999, 10(5), 1055-1064 http://dx.doi.org/10.1109/72.788646;
[7] Geiger C., Kanzow C., Theorie und Numerik Restringierter Optimierungsaufgaben, Springer-Lehrbuch Masterclass, Springer, Berlin-New York, 2002 http://dx.doi.org/10.1007/978-3-642-56004-0; · Zbl 1003.90044
[8] Joachims T., Learning to Classify Text Using Support Vector Machines, Kluwer Internat. Ser. Engrg. Comput. Sci., 668, Kluwer, Boston, 2002 http://dx.doi.org/10.1007/978-1-4615-0907-3;
[9] Kim K., Financial time series forecasting using support vector machines, Neurocomputing, 2003, 55(1-2), 307-319 http://dx.doi.org/10.1016/S0925-2312(03)00372-2;
[10] Lal T.N., Chapelle O., Schölkopf B., Combining a filter method with SVMs, In: Feature Extraction, Studies in Fuzziness and Soft Computing, 207, Springer, Berlin, 2006, 439-445 http://dx.doi.org/10.1007/978-3-540-35488-8_21;
[11] Mota J.F.C., Xavier J.M.F., Aguiar P.M.Q., Püschel M., D-ADMM: A communication-efficient distributed algorithm for separable optimization, preprint available at http://arxiv.org/abs/1202.2805; · Zbl 1393.94059
[12] Noble W.S., Support vector machine application in computational biology, In: Kernel Methods in Computational Biology, MIT Press, 2004, 71-92;
[13] Pedroso J.P., Murata N., Support vector machines with different norms: motivation, formulations and results, Pattern Recognition Lett., 2001, 22(12), 1263-1272 http://dx.doi.org/10.1016/S0167-8655(01)00071-X; · Zbl 0990.68117
[14] Rifkin R.M., Lippert R.A., Value regularization and Fenchel duality, J. Mach. Learn. Res., 2007, 8(3), 441-479; · Zbl 1222.49052
[15] Rockafellar R.T., Convex Analysis, Princeton Math. Ser., 28, Princeton University Press, Princeton, 1970; · Zbl 0193.18401
[16] Ruschhaupt M., Huber W., Poustka A., Mansmann U., A compendium to ensure computational reproducibility in high-dimensional classification tasks, Stat. Appl. Genet. Mol. Biol., 2004, 3, #37; · Zbl 1082.92025
[17] Shawe-Taylor J., Cristianini N., Kernel Methods for Pattern Analysis, Cambridge University Press, Cambridge, 2004 http://dx.doi.org/10.1017/CBO9780511809682; · Zbl 0994.68074
[18] Sra S., Nowozin S., Wright S.J., Optimization for Machine Learning, MIT Press, Cambridge-London, 2011;
[19] Steinwart I., How to compare different loss functions and their risks, Constr. Approx., 2007, 26(2), 225-287 http://dx.doi.org/10.1007/s00365-006-0662-3; · Zbl 1127.68089
[20] Stoppiglia H., Dreyfus G., Dubois R., Oussar Y., Ranking a random feature for variable and feature selection, Journal of Machine Learning Research, 2003, 3, 1399-1414; · Zbl 1102.68598
[21] Suykens J.A.K., Vandewalle J., Least squares support vector machine classifiers, Neural Processing Letters, 1999, 9(3), 293-300 http://dx.doi.org/10.1023/A:1018628609742;
[22] Tibshirani R., Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B, 1996, 58(1), 267-288; · Zbl 0850.62538
[23] Tibshirani R., Saunders M., Rosset S., Zhu J., Knight K., Sparsity and smoothness via the fused lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 2005, 67(1), 91-108 http://dx.doi.org/10.1111/j.1467-9868.2005.00490.x; · Zbl 1060.62049
[24] Tikhonov A.N., Arsenin V.Ya., Solutions of Ill-posed Problems, Scripta Series in Mathematics, Winston & Sons, Washington DC, 1977; · Zbl 0354.65028
[25] Van Gestel T., Baesens B., Garcia J., Van Dijcke P., A support vector machine approach to credit scoring, Banken Financiewezen, 2003, 2, 73-82;
[26] Vapnik V.N., The Nature of Statistical Learning Theory, Springer, New York, 1995 http://dx.doi.org/10.1007/978-1-4757-2440-0; · Zbl 0833.62008
[27] Vapnik V.N., Statistical Learning Theory, Adapt. Learn. Syst. Signal Process. Commun. Control, John Wiley & Sons, New York, 1998; · Zbl 0935.62007
[28] Varma S., Simon R., Bias in error estimation when using cross-validation for model selection, BMC Bioinformatics, 2006, 7, #91;
[29] Wahba G., Spline Models for Observational Data, CBMS-NSF Regional Conf. Ser. in Appl. Math., 59, SIAM, Philadelphia, 1990 http://dx.doi.org/10.1137/1.9781611970128; · Zbl 0813.62001
[30] Xiang D.-H., Zhou D.-X., Classification with Gaussians and convex loss, J. Mach. Learn. Res., 2009, 10, 1447-1468; · Zbl 1235.68207
[31] Ying Y., Huang K., Campbell C., Sparse metric learning via smooth optimization, In: Advances in Neural Information Processing Systems, 22, NIPS Foundation, 2009, 2205-2213;
[32] MATLAB 8.0, The MathWorks, 2012, Natick, Massachusetts;
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.