zbMATH — the first resource for mathematics

Learning state machine-based string edit kernels. (English) Zbl 1191.68490
Summary: During the past few years, several works have been done to derive string kernels from probability distributions. For instance, the Fisher kernel uses a generative model $$M$$ (e.g., a hidden Markov model) and compares two strings according to how they are generated by $$M$$. On the other hand, the marginalized kernels allow the computation of the joint similarity between two instances by summing conditional probabilities. In this paper, we adapt this approach to edit distance-based conditional distributions and we present a way to learn a new string edit kernel. We show that the practical computation of such a kernel between two strings $$x$$ and $$x^{\prime}$$ built from an alphabet $$\varSigma$$ requires (i) to learn edit probabilities in the form of the parameters of a stochastic state machine; and (ii) to calculate an infinite sum over $$\varSigma ^{*}$$ by resorting to the intersection of probabilistic automata as done for rational kernels. We show on a handwritten character recognition task that our new kernel outperforms not only the state of the art string kernels and string edit kernels but also the standard edit distance used by a neighborhood-based classifier.

MSC:
 68T05 Learning and adaptive systems in artificial intelligence 68T10 Pattern recognition, speech recognition
SVMlight
Full Text:
References:
 [1] Scholkopf, B.; Smola, A.J., Learning with kernels: support vector machines, regularization, optimization, and beyond, (2001), MIT Press Cambridge, MA, USA [2] Leslie, C.; Eskin, E.; Noble, W.S., The spectrum kernel: a string kernel for SVM protein classification, (), 564-575 [3] Leslie, C.; Eskin, E.; Cohen, A.; Weston, J.; Noble, W.S., Mismatch string kernels for discriminative protein classification, Bioinformatics, 20, 4, 467-476, (2004) [4] Lodhi, H.; Saunders, C.; Cristianini, N.; Watkins, C.; Scholkopf, B., Text classification using string kernels, Journal of machine learning research, 2, 563-569, (2002) [5] Boisvert, S.; Marchand, M.; Laviolette, F.; Corbeil, J., Hiv-1 coreceptor usage prediction without multiple alignments: an application of string kernels, Retrovirology, 5, 1, 110, (2008) [6] Saigo, H.; Vert, J.-P.; Ueda, N.; Akutsu, T., Protein homology detection using string alignment kernels, Bioinformatics, 20, 11, 1682-1689, (2004) [7] Wagner, R.; Fischer, M., The string-to-string correction problem, Journal of the ACM (JACM), 21, 168-173, (1974) · Zbl 0278.68032 [8] Neuhaus, M.; Bunke, H., Edit distance-based kernel functions for structural pattern classification, Pattern recognition, 39, 10, 1852-1863, (2006) · Zbl 1096.68140 [9] Cortes, C.; Haffner, P.; Mohri, M., Rational kernels: theory and algorithms, Journal of machine learning research, 5, 1035-1062, (2004) · Zbl 1222.68175 [10] Li, H.; Jiang, T., A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mrnas, (), 262-271 [11] Durbin, R.; Eddy, S.; krogh, A.; Mitchison, G., Biological sequence analysis, (1998), Cambridge University Press Cambridge · Zbl 0929.92010 [12] Ristad, S.; Yianilos, P., Learning string-edit distance, IEEE transactions on pattern analysis and machine intelligence, 20, 5, 522-532, (1998) [13] McCallum, A.; Bellare, K.; Pereira, P., A conditional random field for discriminatively-trained finite-state string edit distance, (), 388-400 [14] Oncina, J.; Sebban, M., Learning stochastic edit distance: application in handwritten character recognition, Journal of pattern recognition, 39, 9, 1575-1587, (2006) · Zbl 1096.68724 [15] M. Bernard, J.-C. Janodet, M. Sebban, A discriminative model of stochastic edit distance in the form of a conditional transducer, in: 8th International Colloquium on Grammatical Inference (ICGI’06), Lecture Notes in Computer Science, vol. 4201, Springer, Berlin, 2006, pp. 240-252. · Zbl 1158.68402 [16] Tsuda, K.; Kin, T.; Asai, K., Marginalized kernels for biological sequences, Bioinformatics, 18, 90001, 268-275, (2002) [17] D. Haussler, Convolution kernels on discrete structures, Technical Report, University of California, Santa Cruz, 1999 [18] M.O. Dayhoff, R.M. Schwartz, B.C. Orcutt, A model of evolutionary change in proteins, in: M.O. Dayhoff, Atlas of Protein Sequence and Structure, vol. 5, National Biomedical Research Foundation, Washington, DC, 1978, pp. 345-358. [19] Henikoff, S.; Henikoff, J.G., Amino acid substitution matrices from protein blocks, National Academy of sciences of USA, 89, 22, 10915-10919, (1992) [20] Kashima, H.; Tsuda, K.; Inokuchi, A., Marginalized kernels between labeled graphs, (), 321-328 [21] Hiroto, S.; Jean-Philippe, V.; Tatsuya, A., Optimizing amino acid substitution matrices with a local alignment kernel, BMC bioinformatics, 7, 246, (2006) [22] Dempster, A.P.; Laird, N.M.; Rubin, D.B., Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society. series B (methodological), 39, 1, 1-38, (1977) · Zbl 0364.62022 [23] G. Bouchard, B. Triggs, The tradeoff between generative and discriminative classifiers, in: IASC International Symposium on Computational Statistics (COMPSTAT), Prague, 2004, pp. 721-728. [24] A. Habrard, M. Inesta, D. Rizo, M. Sebban, Melody recognition with learned edit distances, in: Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshops, SSPR 2008 and SPR 2008, 2008, pp. 86-96. [25] M. Bilenko, R.J. Mooney, Adaptive duplicate detection using learnable string similarity measures, in: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), 2003, pp. 39-48. [26] J. Eisner, Parameter estimation for probabilistic finite-state transducers, in: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02), Morristown, NJ, USA, 2001, pp. 1-8. doi: $$\langle$$http://dx.doi.org/http://dx.doi.org/10.3115/1073083.1073085〉. [27] Boyer, L.; Habrard, A.; Muhlenbach, F.; Sebban, M., Learning string edit similarities using constrained finite state machines, (), 37-52 [28] Joachims, T., Making large-scale SVM learning practical, (), 169-184 [29] L. Boyer, Y. Esposito, A. Habrard, J. Oncina, M. Sebban, Sedil: software for edit distance learning, in: Proceedings of the 19th European Conference on Machine Learning (ECML 2008), Lecture Notes in Computer Science, vol. 5212, Springer, Berlin, 2008, pp. 672-677. [30] Bernard, M.; Boyer, L.; Habrard, A.; Sebban, M., Learning probabilistic models of tree edit distance, Pattern recognition, 41, 8, 2611-2629, (2008) · Zbl 1151.68577 [31] Boyer, L.; Habrard, A.; Sebban, M., Learning metrics between tree structured data: application to image recognition, (), 54-66
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.