Learning a Mahalanobis distance metric for data clustering and classification. (English) Zbl 1162.68642

Summary: Distance metric is a key issue in many machine learning algorithms. This paper considers a general problem of learning from pairwise constraints in the form of must-links and cannot-links. As one kind of side information, a must-link indicates the pair of the two data points must be in a same class, while a cannot-link indicates that the two data points must be in two different classes. Given must-link and cannot-link information, our goal is to learn a Mahalanobis distance metric. Under this metric, we hope the distances of point pairs in must-links are as small as possible and those of point pairs in cannot-links are as large as possible. This task is formulated as a constrained optimization problem, in which the global optimum can be obtained effectively and efficiently. Finally, some applications in data clustering, interactive natural image segmentation and face pose estimation are given in this paper. Experimental results illustrate the effectiveness of our algorithm.


68T10 Pattern recognition, speech recognition


Full Text: DOI


[1] Hastie, T.; Tibshirani, R., Discriminant adaptive nearest neighbor classification, IEEE trans. pattern anal. Mach. intell., 18, 6, 607-615, (1996)
[2] Muller, H.; Pun, T.; Squire, D., Learning from user behavior in image retrieval: application of market basket analysis, Int. J. comput. vision, 56, 1-2, 65-77, (2004)
[3] He, X.; King, O.; Ma, W.Y.; Li, M.; Zhang, H.J., Learning a semantic space from users relevance feedback for image retrieval, IEEE trans. circuits systems video technol., 13, 1, 39-48, (2003)
[4] Peltonen, J.; Klami, A.; Kaski, S., Learning more accurate metrics for self-organizing maps, (), 999-1004 · Zbl 1013.68772
[5] C. Domeniconi, D. Gunopulos, Adaptive nearest neighbor classification using support vector machines, in: Advances in Neural Information Processing Systems, vol. 14, 2002.
[6] Peng, J.; Heisterkamp, D.; Dai, H., Adaptive kernel metric nearest neighbor classification, (), 33-36
[7] Yan, R.; Hauptmann, A.; Jin, R., Negative pseudo-relevance feedback in content-based video retrieval, (), 343-346
[8] He, X.; Ma, W.Y.; Zhang, H.J., Learning an image manifold for retrieval, (), 17-23
[9] He, J.R.; Li, M.J.; Zhang, H.J.; Tong, H.H.; Zhang, C.S., Manifold ranking based image retrieval, (), 9-16
[10] Varde, A.S.; Rundensteiner, E.A.; Ruiz, C.; Maniruzzaman, M.; Jr., R., Learning semantics-preserving distance metrics for clustering graphical data, (), 107-112
[11] Wu, G.; Chang, E.Y.; Panda, N., Formulating context-dependent similarity functions, (), 725-734
[12] Korkmaz, E.E.; Ucoluk, G., Choosing a distance metric for automatic word categorization, (), 111-120
[13] Li, F.; Yang, J.; Wang, J., A transductive framework of distance metric learning by spectral dimensionality reduction, (), 513-520
[14] L. Yang, R. Jin, Distance metric learning: a comprehensive survey, Technical Report, Michigan State University \(\langle\)http://www.cse.msu.edu/∼yangliu1/frame_survey_v2.pdf⟩, 2006.
[15] Goldberger, J.; Roweis, S.; Hinton, G.; Salakhutdinov, R., Neighbourhood components analysis, (), 513-520
[16] Weinberger, K.; Blitzer, J.; Saul, L., Distance metric learning for large margin nearest neighbor classification, (), 1473-1480
[17] Torresani, L.; Lee, K.C., Large margin component analysis, (), 505-512
[18] Globerson, A.; Roweis, S., Metric learning by collapsing classes, (), 451-458
[19] Zhang, Z.H.; Kwok, J.T.; Yeung, D.Y., Parametric distance metric learning with label information, (), 1450-1452
[20] G. Lebanon, Flexible metric nearest neighbor classification, Technical Report, Statistics Department, Stanford University, 1994.
[21] C.H. Hoi, W. Liu, M.R. Lyu, W.Y. Ma, Learning distance metrics with contextual constraints for image retrieval, in: Proceedings of Conference on Computer Vision and Pattern Recognition, vol. 2, New York, USA, 2006, pp. 2072-2078.
[22] Xing, E.P.; Ng, A.Y.; Jordan, M.I.; Russell, S., Distance metric learning, with application to clustering with side-information, (), 505-512
[23] Bar-hillel, A.; Hertz, T.; Shental, N.; Weinshall, D., Learning distance functions using equivalence relations, J. Mach. learn. res., 6, 11-18, (2005)
[24] Tsang, I.W.; Kwok, J.T., Distance metric learning with kernels, (), 126-129
[25] Rosales, R.; Fung, G., Learning sparse metrics via linear programming, (), 367-373
[26] Schultz, M.; Joachims, T., Learning a distance metric from relative comparisons, ()
[27] L. Yang, R. Jin, R. Sukthankar, Y. Liu, An efficient algorithm for local distance metric learning, in: AAAI, Boston, USA, 2006.
[28] Mochihashi, D.; Kikui, G.; Kita, K., Learning nonstructural distance metric by minimum cluster distortions, (), 341-348
[29] W. Tang, S. Zhong, Pairwise constraints-guided dimensionality reduction, in: SDM Workshop on Feature Selection for Data Mining, 2006.
[30] Bie, T.D.; Momma, M.; Cristianini, N., Efficiently learning the metric using side-information, (), 175-189 · Zbl 1263.68128
[31] N. Shental, A. Bar-Hillel, T. Hertz, D. Weinshall, Computing Gaussian mixture models with em using side-information, in: ICML Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, 2003.
[32] Lanckriet, G.R.G.; Cristianini, N.; Bartlett, P.; Ghaoui, L.E.; Jordan, M.I., Learning the kernel matrix with semidefinite programming, J. Mach. learn. res., 5, 1, 27-72, (2004) · Zbl 1222.68241
[33] Lu, Z.; Leen, T., Semi-supervised learning with penalized probabilistic clustering, (), 849-856
[34] Yeung, D.-Y.; Chang, H., Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints, Pattern recognition, 39, 5, 1007-1010, (2006) · Zbl 1158.68483
[35] Zhang, J.; Yan, R., On the value of pairwise constraints in classification and consistency, (), 1111-1118
[36] Weinberger, K.Q.; Tesauro, G., Metric learning for kernel regression, (), 608-615
[37] Hertz, T.; Bar-Hillel, A.; Weinshall, D., Boosting margin based distance functions for clustering, (), 393-400
[38] Bilenko, M.; Basu, S.; Mooney, R.J., Integrating constraints and metric learning in semi-supervised clustering, (), 81-88
[39] Davis, J.V.; Kulis, B.; Jain, P.; Sra, S.; Dhillon, I.S., Information-theoretic metric learning, (), 209-216
[40] Yang, L.; Jin, R.; Sukthankar, R., Bayesian active distance metric learning, ()
[41] Kumar, N.; Kummamuru, K.; Paranjpe, D., Semi-supervised clustering with metric learning using relative comparisons, (), 693-696
[42] Rosales, R.; Fung, G., Learning sparse metrics via linear programming, (), 367-373
[43] Tsang, I.W.; Cheung, P.M.; Kwok, J.T., Kernel relevant component analysis for distance metric learning, (), 954-959
[44] Kwok, J.; Tsang, I., Learning with idealized kernels, (), 400-407
[45] Zhang, Z., Learning metrics via discriminant kernels and multidimensional scaling: toward expected Euclidean representation, (), 872-879
[46] Chen, J.; Zhao, Z.; Ye, J.; Liu, H., Nonlinear adaptive distance metric learning for clustering, (), 123-132
[47] Shalev-Shwartz, S.; Singer, Y.; Ng, A.Y., Online and batch learning of pseudo-metrics, (), 94-101
[48] Bach, F.R.; Jordan, M.I., Learning spectral clustering, With application to speech separation, J. Mach. learn. res., 7, 1963-2001, (2006) · Zbl 1222.68138
[49] E.-J. Ong, R. Bowden, Learning distances for arbitrary visual features, in: Proceedings of British Machine Vision Conference, vol. 2, Edinburgh, England, 2006, pp. 749-758.
[50] Chopra, S.; Hadsell, R.; LeCun, Y., Learning a similarity metric discriminatively, with application to face verification, (), 539-546
[51] L. Yang, R. Jin, R. Sukthankar, B. Zheng, L. Mummert, M. Satyanarayanan, M. Chen, D. Jukic, Learning distance metrics for interactive search-assisted diagnosis of mammograms, in: SPIE Symposium on Medical Imaging: Computer-Aided Diagnosis, vol. 6514, 2007.
[52] Yan, R.; Zhang, J.; Yang, J.; Hauptmann, A., A discriminative learning framework with pairwise constraints for video object classification, (), 284-291
[53] Langmead, C.J., A randomized algorithm for learning Mahalanobis metrics: application to classification and regression of biological data, (), 217-226
[54] Chang, E.; Li, B., On learning perceptual distance function for image retrieval, (), 4092-4095
[55] Belhumeur, P.N.; Hespanha, J.P.; Kriegman, D.J., Eigenfaces vs. fisherfaces: recognition using class specific linear projection, IEEE trans. pattern anal. Mach. intelligence., 19, 7, 711-720, (1997)
[56] Chen, L.; Liao, H.; Ko, M.; Lin, J.; Yu, G., A new lda based face recognition system which can solve the small sample size problem, Pattern recognition, 33, 10, 1713-1726, (2000)
[57] Yu, H.; Yang, J., A direct lda algorithm for high-dimensional data—with application to face recognition, Pattern recognition, 34, 10, 2067-2070, (2001) · Zbl 0993.68091
[58] Duda, R.O.; Hart, P.E.; Stork, D.G., Pattern classification, (2000), Wiley New York, USA
[59] Golub, G.H.; van Loan, C.F., Matrix computations, (1996), The Johns Hopkins University Press Baltimore, MD, USA · Zbl 0865.65009
[60] Guo, Y.F.; Li, S.J.; Yang, J.Y.; Shu, T.T.; Wu, L.D., A generalized foley – sammon transform based on generalized Fisher discriminant criterion and its application to face recognition, Pattern recognition lett., 24, 1, 147-158, (2003) · Zbl 1055.68092
[61] Samaria, F.S.; Harter, A.C., Parameterisation of a stochastic model for human face identification, (), 138-142
[62] S. Nene, S. Nayar, H. Murase, Columbia object image library (coil-20), Technical Report, Columbia University, 1996.
[63] Boykov, Y.Y.; Jolly, M.P., Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images, (), 105-112
[64] Li, Y.; Sun, J.; Tang, C.K.; Shum, H.Y., Lazy snapping, (), 303-307
[65] S. Yan, X. Tang, Trace quotient problems revisited, in: European Conference on Computer Vision, vol. 2, Graz, Austria, 2006, pp. 232-244.
[66] Wang, H.; Yan, S.; Xu, D.; Tang, X.; Huang, T., Trace ratio vs. ratio trace for dimensionality reduction, ()
[67] Gourier, N.; Hall, D.; Crowley, J., Estimating face orientation from robust detection of salient facial features, ()
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.