×

Good edit similarity learning by loss minimization. (English) Zbl 1260.68187

Summary: Similarity functions are a fundamental component of many learning algorithms. When dealing with string or tree-structured data, measures based on the edit distance are widely used, and there exist a few methods for learning them from data. However, these methods offer no theoretical guarantee as to the generalization ability and discriminative power of the learned similarities. In this paper, we propose an approach to edit similarity learning based on loss minimization, called GESL. It is driven by the notion of \((\epsilon,\gamma,\tau)\)-goodness, a theory that bridges the gap between the properties of a similarity function and its performance in classification. Using the notion of uniform stability, we derive generalization guarantees that hold for a large class of loss functions. We also provide experimental results on two real-world datasets which show that edit similarities learned with GESL induce more accurate and sparser classifiers than other (standard or learned) edit similarities.

MSC:

68Q32 Computational learning theory

Software:

LMNN
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bach, F., & Obozinski, G. (2010). Sparse methods for machine learning: theory and algorithms. ECML/PKDD tutorial. http://www.di.ens.fr/ fbac/ecml2010tutorial.
[2] Balcan, M. F.; Blum, A., On a theory of learning with similarity functions, 73-80 (2006) · doi:10.1145/1143844.1143854
[3] Balcan, M. F.; Blum, A.; Srebro, N., Improved guarantees for learning via similarity functions, 287-298 (2008)
[4] Bellet, A., Bernard, M., Murgue, T., & Sebban, M. (2010). Learning state machine-based string edit kernels. Pattern Recognition, 43(6), 2330-2339. · Zbl 1191.68490 · doi:10.1016/j.patcog.2009.12.008
[5] Bellet, A.; Habrard, A.; Sebban, M., An experimental study on learning with good edit similarity functions, 126-133 (2011) · doi:10.1109/ICTAI.2011.27
[6] Bellet, A.; Habrard, A.; Sebban, M., Learning good edit similarities with generalization guarantees, No. 6911, 188-203 (2011) · doi:10.1007/978-3-642-23780-5_22
[7] Bernard, M., Boyer, L., Habrard, A., & Sebban, M. (2008). Learning probabilistic models of tree edit distance. Pattern Recognition, 41(8), 2611-2629. · Zbl 1151.68577 · doi:10.1016/j.patcog.2008.01.011
[8] Bian, W.; Tao, D., Learning a distance metric by empirical loss minimization, 1186-1191 (2011)
[9] Bilenko, M.; Mooney, R. J., Adaptive duplicate detection using learnable string similarity measures, 39-48 (2003) · doi:10.1145/956750.956759
[10] Bille, P. (2005). A survey on tree edit distance and related problem. Theoretical Computer Science, 337(1-3), 217-239. · Zbl 1078.68152 · doi:10.1016/j.tcs.2004.12.030
[11] Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. Journal of Machine Learning Research, 2, 499-526. · Zbl 1007.68083
[12] Davis, J. V.; Kulis, B.; Jain, P.; Sra, S.; Dhillon, I. S., Information-theoretic metric learning, 209-216 (2007) · doi:10.1145/1273496.1273523
[13] Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1-38. · Zbl 0364.62022
[14] Denis, F.; Esposito, Y.; Habrard, A., Learning rational stochastic languages, No. 4005, 274-288 (2006), Berlin · Zbl 1143.68414
[15] Denis, F.; Gilbert, E.; Habrard, A.; Ouardi, F.; Tommasi, M., Relevant representations for the inference of rational stochastic tree languages, No. 5278, 57-70 (2008), Berlin · Zbl 1177.68111
[16] Etessami, K., & Yannakakis, M. (2009). Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations. Journal of the ACM, 56, 1. · Zbl 1325.68091 · doi:10.1145/1462153.1462154
[17] Freeman, H. (1974). Computer processing of line-drawing images. ACM Computing Surveys, 6, 57-97. · Zbl 0287.68067 · doi:10.1145/356625.356627
[18] Henikoff, S.; Henikoff, J. G., Amino acid substitution matrices from protein blocks, 10,915-10,919 (1992)
[19] Jin, R.; Wang, S.; Zhou, Y., Regularized distance metric learning: theory and algorithm, 862-870 (2009)
[20] Klein, P., Computing the edit-distance between unrooted ordered trees, No. 1461, 91-102 (1998), Berlin · Zbl 0932.68066
[21] Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics-Doklandy, 6, 707-710.
[22] McCallum, A.; Bellare, K.; Pereira, F., A conditional random field for discriminatively-trained finite-state string edit distance, 388-395 (2005)
[23] McDiarmid, C., On the method of bounded differences, 148-188 (1989), Cambridge
[24] Mehdad, Y., Automatic cost estimation for tree edit distance using particle swarm optimization, 289-292 (2009)
[25] Neuhaus, M.; Bunke, H., A probabilistic approach to learning costs for graph edit distance, 389-393 (2004), New York · doi:10.1109/ICPR.2004.1334548
[26] Oncina, J., & Sebban, M. (2006). Learning stochastic edit distance: application in handwritten character recognition. Pattern Recognition, 39(9), 1575-1587. · Zbl 1096.68724 · doi:10.1016/j.patcog.2006.03.011
[27] Ristad, E. S., & Yianilos, P. N. (1998). Learning string-edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(5), 522-532. · doi:10.1109/34.682181
[28] Rosasco, L., Vito, E. D., Caponnetto, A., Piana, M., & Verri, A. (2004). Are loss functions all the same? Neural Computation, 16(5), 1063-1076. · Zbl 1089.68109 · doi:10.1162/089976604773135104
[29] Saigo, H., Vert, J. P., & Akutsu, T. (2006). Optimizing amino acid substitution matrices with a local alignment kernel. BMC Bioinformatics, 7(246), 1-12.
[30] Selkow, S. (1977). The tree-to-tree editing problem. Information Processing Letters, 6(6), 184-186. · Zbl 0391.68022 · doi:10.1016/0020-0190(77)90064-3
[31] Takasu, A., Bayesian similarity model estimation for approximate recognized text search, 611-615 (2009) · doi:10.1109/ICDAR.2009.193
[32] Wang, L.; Yang, C.; Feng, J., On learning with dissimilarity functions, 991-998 (2007) · Zbl 1151.40300 · doi:10.1145/1273496.1273621
[33] Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10, 207-244. · Zbl 1235.68204
[34] Yang, L., & Jin, R. (2006). Distance metric learning: a comprehensive survey. Tech. rep., Department of Computer Science and Engineering, Michigan State University.
[35] Zhang, K., & Shasha, D. (1989). Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing, 18(6), 1245-1262. · Zbl 0692.68047 · doi:10.1137/0218082
[36] Zhu, J.; Rosset, S.; Hastie, T.; Tibshirani, R., 1-norm support vector machines, 49-56 (2003)
[37] Zigoris, P.; Eads, D.; Zhang, Y., Unsupervised learning of tree alignment models for information extraction, 45-49 (2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.