×

Information-theoretic semi-supervised metric learning via entropy regularization. (English) Zbl 1415.68173

Summary: We propose a general information-theoretic approach to semi-supervised metric learning called SERAPH (SEmi-supervised metRic leArning Paradigm with Hypersparsity) that does not rely on the manifold assumption. Given the probability parameterized by a Mahalanobis distance, we maximize its entropy on labeled data and minimize its entropy on unlabeled data following entropy regularization. For metric learning, entropy regularization improves manifold regularization by considering the dissimilarity information of unlabeled data in the unsupervised part, and hence it allows the supervised and unsupervised parts to be integrated in a natural and meaningful way. Moreover, we regularize SERAPH by trace-norm regularization to encourage low-dimensional projections associated with the distance metric. The nonconvex optimization problem of SERAPH could be solved efficiently and stably by either a gradient projection algorithm or an EM-like iterative algorithm whose M-step is convex. Experiments demonstrate that SERAPH compares favorably with many well-known metric learning methods, and the learned Mahalanobis distance possesses high discriminability even under noisy environments.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

GSML
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Argyriou, A., Evgeniou, T., & Pontil, M. (2007). In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Multi-task feature learning. In Advances in neural information processing systems,19. Cambrige, MA: MIT Press. · Zbl 1470.68073
[2] Baghshah, M., & Shouraki, S. (2009). Semi-supervised metric learning using pairwise constraints. In Proceedings of the 21st International Joint Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press. · Zbl 1207.68261
[3] Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. In T. C. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems, 14. Cambridge, MA: MIT Press. · Zbl 1085.68119
[4] Bellare, K., Druck, G., & McCallum, A. (2009). Alternating projections for learning with expectation constraints. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. N.p.: AUAI Press.
[5] Bellet, A., Habrard, A., & Sebban, M. (2013). A survey on metric learning for feature vectors and structured data. http://arxiv.org/abs/1306.6709 · Zbl 1308.68005
[6] Berger, A., Pietra, S., & Pietra, V. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22, 39-71.
[7] Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press. , · Zbl 1058.90049
[8] Chiaromonte, F., & Cook, R. (2002). Sufficient dimension reduction and graphics in regression. Annals of the Institute of Statistical Mathematics, 54, 768-795. , · Zbl 1047.62066
[9] Davis, J., Kulis, B., Jain, P., Sra, S., & Dhillon, I. (2007). Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine Learning. Madison, WI: Omni Press. ,
[10] Dudík, M., & Schapire, R. E. (2006). Maximum entropy distribution estimation with generalized regularization. In Proceedings of the 19th Annual Conference on Learning Theory. Berlin: Springer-Verlag. , · Zbl 1143.68531
[11] Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179-188. ,
[12] Fukumizu, K., Bach, F., & Jordan, M. I. (2009) Kernel dimension reduction in regression. Annals of Statistics, 37(4), 1871-1905. , · Zbl 1168.62049
[13] Gillenwater, J., Ganchev, K., Graça, J., Pereira, F., & Taskar, B. (2011). Posterior sparsity in usupervised dependency parsing. Journal of Machine Learning Research, 12, 455-490. · Zbl 1280.68269
[14] Globerson, A., & Roweis, S. (2006). Metric learning by collapsing classes. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Advances in neural information processing systems, 18. Cambridge, MA: MIT Press.
[15] Goldberger, J., Roweis, S., Hinton, G., & Salakhutdinov, R. (2005). Neighbourhood components analysis. In L. K. Saul, Y. Weiss, & J. Platt (Eds.), Advances in neural information processing systems, 17. Cambridge, MA: MIT Press.
[16] Gomes, R., Krause, A., & Perona, P. (2010). Discriminative clustering by regularized information maximization. In J. Lafferty (Ed.), Advances in neural information processing systems, 23. Red Hook, NY: Curran.
[17] Graça, J., Ganchev, K., & Taskar, B. (2008). Expectation maximization and posterior constraints. In J. C. Platt, D. Köller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems, 20. Cambridge, MA: MIT Press.
[18] Graça, J., Ganchev, K., Taskar, B., & Pereira, F. (2009). Posterior vs. parameter sparsity in latent variable models. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, & A. Culota (Eds.), Advances in neural information processing systems, 22. · Zbl 1280.68269
[19] Grandvalet, Y., & Bengio, Y. (2005). Semi-supervised learning by entropy minimization. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems, 17. Cambridge, MA: MIT Press.
[20] Grandvalet, Y., & Bengio, Y. (2006). Entropy regularization. In O. Chapelle, B. Schölkopf, & A. Zien (Eds.), Semi-supervised learning (pp. 151-168). Cambridge, MA: MIT Press.
[21] Hoi, S., Liu, W., & Chang, S.-F. (2008). Semi-supervised distance metric learning for collaborative image retrieval. In Proceedings of the 21st IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press. ,
[22] Huang, K., Jin, R., Xu, Z., & Liu, C. (2010). Robust metric learning by smooth optimization. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann.
[23] Huang, K., Ying, Y., & Campbell, C. (2009). GSML: A unified framework for sparse metric learning. In Proceedings of 9th IEEE International Conference on Data Mining. N.p.: AUAI Press. ,
[24] Jain, P., Kulis, B., & Dhillon, I. (2010) Inductive regularized learning of kernel functions. In J. Lafferty (Ed.), Advances in neural information processing systems, 23. Red Hook, NY: Curran.
[25] Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620-630. , · Zbl 0084.43701
[26] Liu, W., Ma, S., Tao, D., Liu, J., & Liu, P. (2010). Semi-supervised sparse metric learning using alternating linearization optimization. In Proceedings of the 16th ACM International Conference on Knowledge Discovery and Data Mining. New York: ACM Press. ,
[27] Niu, G., Dai, B., Yamada, M., & Sugiyama, M. (2012). Information-theoretic semi-supervised metric learning via entropy regularization. In Proceedings of the 29th International Conference on Machine Learning. Madison, WI: Omni Press. · Zbl 1415.68173
[28] Polyak, B. T. (1967) A general method for solving extremal problems (in Russian). Soviet Mathematics Doklady, 174(1), 33-36.
[29] Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323-2326. ,
[30] Schölkopf, B., & Smola, A. (2001). Learning with kernels. Cambridge, MA: MIT Press. · Zbl 1019.68094
[31] Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis. Journal of Machine Learning Research, 8, 1027-1061. · Zbl 1222.68312
[32] Sugiyama, M., Idé, T., Nakajima, S., & Sese, J. (2010). Semi-supervised local Fisher discriminant analysis for dimensionality reduction. Machine Learning, 78(1-2), 35-61. , · Zbl 1470.68180
[33] Tenenbaum, J., de Silva, V., & Langford, J. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319-2323. ,
[34] Torresani, L., & Lee, K. (2007). Large margin component analysis. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems, 19. Cambridge, MA: MIT Press.
[35] Wang, J., Do, H., Woznica, A., & Kalousis, A. (2011). Metric learning with multiple kernels. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. C. N Pereira, & Q. Weinberger (Eds.), Advances in neural information processing systems, 24. Red Hook, NY: Curran.
[36] Weinberger, K., Blitzer, J., & Saul, L. (2006). Distance metric learning for large margin nearest neighbor classification. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Advances in neural information processing systems, 18. Cambridge, MA: MIT Press.
[37] Xing, E., Ng, A., Jordan, M. I., & Russell, S. (2003). Distance metric learning with application to clustering with side-information. In S. Becker, S. Thrün, & K. Obermayer (Eds.), Advances in neural information processing systems, 15. Cambridge, MA: MIT Press.
[38] Yang, L., Jin, R., Sukthankar, R., & Liu, Y. (2006). An efficient algorithm for local distance metric learning. In Proceedings of the 21st National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.
[39] Ying, Y., Huang, K., & Campbell, C. (2009). Sparse metric learning via smooth optimization. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, & A. Culota (Eds.), Advances in neural information processing systems, 22.
[40] Yuan, M. & Lin, Y. (2006) Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68 (1), 49-67. , · Zbl 1141.62030
[41] Zha, Z., Mei, T., Wang, M., Wang, Z., & Hua, X. (2009). Robust distance metric learning with auxiliary knowledge. In Proceedings of 21st International Joint Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.