×

zbMATH — the first resource for mathematics

An efficient method for clustered multi-metric learning. (English) Zbl 1441.68212
Summary: Distance metric learning, which aims at finding a distance metric that separates examples of one class from examples of the other classes, is the key to the success of many machine learning tasks. Although there has been an increasing interest in this field, learning a global distance metric is insufficient to obtain satisfactory results when dealing with heterogeneously distributed data. A simple solution to tackle this kind of data is based on kernel embedding methods. However, it quickly becomes computationally intractable as the number of examples increases. In this paper, we propose an efficient method that learns multiple local distance metrics instead of a single global one. More specifically, the training examples are divided into several disjoint clusters, in each of which a distance metric is trained to separate the data locally. Additionally, a global regularization is introduced to preserve some common properties of different clusters in the learned metric space. By learning multiple distance metrics jointly within a single unified optimization framework, our method consistently outperforms single distance metric learning methods, while being more efficient than other state-of-the-art multi-metric learning methods.
MSC:
68T05 Learning and adaptive systems in artificial intelligence
90C25 Convex programming
90C90 Applications of mathematical programming
Software:
KEEL; LMNN; Pegasos
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Bohné, J.; Ying, Y.; Gentric, S.; Pontil, M., Large margin local metric learning, Proceedings of the European Conference on Computer Vision, 679-694 (2014)
[2] Bottou, L., Stochastic gradient learning in neural networks, Proceedings of Neuro-Nîmes 91 (1991)
[3] Boyd, S.; Vandenberghe, L., Convex Optimization (2004), Cambridge University Press · Zbl 1058.90049
[4] Cole, R.; Fanty, M., Spoken letter recognition, Proceedings of the 3rd DARPA Speech and Natural Language Workshop, 385-390 (1990)
[5] Davis, J. V.; Kulis, B.; Jain, P.; Sra, S.; Dhillon, I. S., Information-theoretic metric learning, Proceedings of the 24th International Conference on Machine Learning, 209-216 (2007)
[6] Demšar, J., Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7, 1-30 (2006) · Zbl 1222.68184
[7] Domeniconi, C.; Peng, J.; Gunopulos, D., An adaptive metric machine for pattern classification, Advances in Neural Information Processing Systems 13, 458-464 (2001)
[8] Elkan, C., Using the triangle inequality to accelerate k-means, Proceedings of the 20th International Conference on Machine Learning, 147-153 (2003)
[9] Evgeniou, T.; Pontil, M., Regularized multi-task learning, Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 109-117 (2004)
[10] Frome, A.; Singer, Y.; Malik, J., Image retrieval and classification using local distance functions, Advances in Neural Information Processing Systems 19, 417-424 (2007)
[11] Frome, A.; Singer, Y.; Sha, F.; Malik, J., Learning globally-consistent local distance functions for shape-based image retrieval and classification, Proceedings of the IEEE 11th International Conference on Computer Vision, 1-8 (2007)
[12] Golub, G. H.; Van Loan, C. F., Matrix Computations (1996), Johns Hopkins University Press · Zbl 0865.65009
[13] Gu, Q.; Han, J., Clustered support vector machines, Proceedings of the 16th International Conference on Artificial Intelligence and Statistics, 307-315 (2013)
[14] Hao, S.; Zhao, P.; Liu, Y.; Hoi, S. C.H.; Miao, C., Online multitask relative similarity learning, Proceedings of the 26th International Joint Conference on Artificial Intelligence, 1823-1829 (2017)
[15] Hartigan, J. A.; Wong, M. A., Algorithm AS 136: a \(k\)-means clustering algorithm, J. R. Stat. Soc. Ser. C (Appl. Stat.), 28, 1, 100-108 (1979) · Zbl 0447.62062
[16] Hastie, T.; Tibshirani, R., Discriminant adaptive nearest neighbor classification, IEEE Trans. Pattern Anal. Mach. Intell., 18, 6, 607-616 (1996)
[17] Higham, N. J., Computing a nearest symmetric positive semidefinite matrix, Linear Algebra Appl., 103, 103-118 (1988) · Zbl 0649.65026
[18] Hiriart-Urruty, J.-B.; Lemaréchal, C., Fundamentals of Convex Analysis (2012), Springer Science & Business Media
[19] Hu, J.; Lu, J.; Tan, Y. P., Sharable and individual multi-view metric learning, IEEE Trans. Pattern Anal. Mach. Intell., Inpress (2018)
[20] Hull, J. J., A database for handwritten text recognition research, IEEE Trans. Pattern Anal. Mach. Intell., 16, 5, 550-554 (1994)
[21] Jain, P.; Kulis, B.; Davis, J. V.; Dhillon, I. S., Metric and kernel learning using a linear transformation, J. Mach. Learn. Res., 13, 519-547 (2012) · Zbl 1283.68290
[22] Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proc. IEEE, 86, 11, 2278-2324 (1998)
[23] Li, M.; Wang, Q.; Zhang, D.; Li, P.; Zuo, W., Joint distance and similarity measure learning based on triplet-based constraints, Inf. Sci. (Ny), 406, 119-132 (2017) · Zbl 1429.68229
[24] Liang, J.; Hu, Q.; Zhu, P.; Wang, W., Efficient multi-modal geometric mean metric learning, Pattern Recognit., 75, 188-198 (2018)
[25] Lim, D.; Lanckriet, G., Efficient learning of Mahalanobis metrics for ranking, Proceedings of the 21st International Conference on Machine Learning, 1980-1988 (2014)
[26] Mu, Y.; Ding, W.; Tao, D., Local discriminative distance metrics ensemble learning, Pattern Recognit, 46, 2337-2349 (2013) · Zbl 1316.68118
[27] Nguyen, B.; Morell, C.; De Baets, B., Large-scale distance metric learning for \(k\)-nearest neighbors regression, Neurocomputing, 214, 805-814 (2016)
[28] Nguyen, B.; Morell, C.; De Baets, B., Supervised distance metric learning through maximization of the jeffrey divergence, Pattern Recognit., 64, 215-225 (2017) · Zbl 1429.68235
[29] Ortega, J. M.; Rheinboldt, W. C., Iterative solution of nonlinear equations in several variables (1979), Academic Press · Zbl 0949.65053
[30] Parameswaran, S.; Weinberger, K. Q., Large margin multi-task metric learning, Advances in Neural Information Processing Systems 23, 1867-1875 (2010)
[31] Ramanan, D.; Baker, S., Local distance functions: a taxonomy, new algorithms, and an evaluation, IEEE Trans. Pattern Anal. Mach. Intell., 33, 4, 794-806 (2011)
[32] Recht, B.; Fazel, M.; Parrilo, P. A., Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Rev., 52, 3, 471-501 (2010) · Zbl 1198.90321
[33] Sargent, R. W.H.; Sebastian, D. J., On the convergence of sequential minimization algorithms, J. Optim. Theory Appl., 12, 6, 567-575 (1973) · Zbl 0253.65037
[34] Schölkopf, B.; Herbrich, R.; Smola, A. J., A generalized representer theorem, Proceedings of the 14th Annual Conference on Computational Learning Theory, 416-426 (2001) · Zbl 0992.68088
[35] Schölkopf, B.; Smola, A., Learning with kernels: Support vector machines, regularization, optimization, and beyond (2002), MIT Press
[36] Shalev-Shwartz, S.; Singer, Y.; Ng, A. Y., Online and batch learning of pseudo-metrics, Proceedings of the 21st International Conference on Machine Learning, 94-101 (2004)
[37] Shalev-Shwartz, S.; Singer, Y.; Srebro, N.; Cotter, A., Pegasos: primal estimated sub-gradient solver for SVM, Math Program., 127, 1, 3-30 (2011) · Zbl 1211.90239
[38] Shamir, O.; Zhang, T., Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes, Proceedings of the 30th International Conference on Machine Learning, 71-79 (2013)
[39] Shi, Y.; Bellet, A.; Sha, F., Sparse compositional metric learning, Proceedings of the 28th AAAI Conference on Artificial Intelligence, 2078-2084 (2014)
[40] Torresani, L.; Lee, K.-C., Large margin component analysis, Advances in Neural Information Processing Systems 19, 1385-1392 (2007)
[41] Triguero, I.; González, S.; Moyano, J. M.; García, S.; Alcalá-Fdez, J.; Luengo, J.; Fernández, A.; del Jesús, M. J.; Sánchez, L.; Herrera, F., KEEL 3.0: an open source software for multi-stage analysis in data mining, Int. J. Comput. Intell. Syst., 10, 1238-1249 (2017)
[42] Tseng, P., Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl., 109, 3, 475-494 (2001) · Zbl 1006.65062
[43] Wang, J.; Kalousis, A.; Woznica, A., Parametric local metric learning for nearest neighbor classification, Advances in Neural Information Processing Systems 25, 1601-1609 (2012)
[44] Weinberger, K. Q.; Saul, L. K., Distance metric learning for large margin nearest neighbor classification, The Journal of Machine Learning Research, 10, 207-244 (2009) · Zbl 1235.68204
[45] Xing, E. P.; Jordan, M. I.; Russell, S.; Ng, A., Distance metric learning with application to clustering with side-information, Advances in Neural Information Processing Systems 14, 505-512 (2002)
[46] Yang, P.; Huang, K.; Liu, C.-L., Geometry preserving multi-task metric learning, Mach. Learn., 92, 1, 133-175 (2013) · Zbl 1273.68309
[47] Yi, S.; Jiang, N.; Feng, B.; Wang, X.; Liu, W., Online similarity learning for visual tracking, Inf. Sci. (Ny), 364, 33-50 (2016)
[48] Ying, Y.; Li, P., Distance metric learning with eigenvalue optimization, J. Mach. Learn. Res., 13, 1-26 (2012) · Zbl 1283.68309
[49] Zhang, H.; Patel, V. M.; Chellappa, R., Hierarchical multimodal metric learning for multimodal classification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2925-2933 (2017)
[50] Zheng, Y.; Fan, J.; Zhang, J.; Gao, X., Hierarchical learning of multi-task sparse metrics for large-scale image classification, Pattern Recognit., 67, 97-109 (2017)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.