×

Recent advances in graph-based pattern recognition with applications in document analysis. (English) Zbl 1209.68443

Summary: Graphs are a powerful and popular representation formalism in pattern recognition. Particularly in the field of document analysis they have found widespread application. From the formal point of view, however, graphs are quite limited in the sense that the majority of mathematical operations needed to build common algorithms, such as classifiers or clustering schemes, are not defined. Consequently, we observe a severe lack of algorithmic procedures that can directly be applied to graphs. There exists recent work, however, aimed at overcoming these limitations. The present paper first provides a review of the use of graph representations in document analysis. Then we discuss a number of novel approaches suitable for making tools from statistical pattern recognition available to graphs. These novel approaches include graph kernels and graph embedding. With several experiments, using different data sets from the field of document analysis, we show that the new methods have great potential to outperform traditional procedures applied to graph representations.

MSC:

68T10 Pattern recognition, speech recognition
68R10 Graph theory (including graph drawing) in computer science

Software:

clusfind
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Duda, R.; Hart, P.; Stork, D., Pattern Classification (2000), Wiley-Interscience
[3] Conte, D.; Foggia, P.; Sansone, C.; Vento, M., Thirty years of graph matching in pattern recognition, International Journal of Pattern Recognition and Artificial Intelligence, 18, 3, 265-298 (2004)
[4] Nagy, G., Twenty years of document image analysis in PAMI, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 1, 38-62 (2000)
[5] Cheriet, M.; Kharma, N.; Liu, C.; Suen, C., Character Recognition Systems: A Guide for Students and Practitioners (2007), Wiley-Interscience · Zbl 1133.68070
[8] Lee, S.; Kim, J.; Groen, A., Translation-, rotation-, and scale-invariant recognition of hand-drawn symbols in schematic diagrams, International Journal of Pattern Recognition and Artificial Intelligence, 4, 1, 1-25 (1990)
[9] Lladós, J.; Martí, E.; Villanueva, J., Symbol recognition by error-tolerant subgraph matching between region adjacency graphs, IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 10, 1137-1143 (2001)
[10] Messmer, B.; Bunke, H., Clustering and error-corrrecting matching of graphs for learning and recognition of symbols in engineering drawings, (Hull, J.; Taylor, S., Document Analysis Systems II (1998), World Scientific), 102-117
[11] Cordella, L.; Foggia, P.; Sansone, C.; Vento, M., Fast graph matching for detecting CAD image components, (Proceedings of the 15th International Conference on Pattern Recognition, vol. 2 (2000)), 1038-1041
[12] Lladós, J.; Sánchez, G., Graph matching versus graph parsing in graphics recognition, International Journal of Pattern Recognition and Artificial Intelligence, 18, 3, 455-475 (2004)
[13] Bunke, H., On the generative power of sequential and parallel programmed graph grammars, Computing, 29, 89-112 (1982) · Zbl 0483.68065
[14] Bunke, H., Attributed programmed graph grammars and their application to schematic diagram interpretation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 4, 6, 582-674 (1982) · Zbl 0491.68086
[15] Lu, S.; Ren, Y.; Suen, C., Hierarchical attributed graph representation and recognition of handwritten Chinese characters, Pattern Recognition, 24, 7, 617-632 (1991)
[16] Huang, X.; Gu, J.; Wu, Y., A constrained approach to multifont Chinese character recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 15, 8, 838-843 (1993)
[17] Suganthan, P.; Yan, H., Recognition of handprinted Chinese characters by constrained graph matching, Image and Vision Computing, 16, 3, 191-201 (1998)
[18] Rocha, J.; Pavlidis, T., A shape analysis model with applications to a character recognition system, IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 4, 393-404 (1994)
[19] Rocha, J.; Pavlidis, T., Character recognition without segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 9, 903-909 (1995)
[20] Chakravarthy, V.; Kompella, B., The shape of handwritten characters, Pattern Recognition Letters, 24, 12, 1901-1913 (2003)
[21] Rahgozar, M., Document table recognition by graph rewriting, (Nagl, M.; Schürr, A.; Münch, M., Applications of Graph Transformations with Industrial Relevance, Lecture Notes in Computer Science, vol. 1779 (2000)), 185-197
[22] Lopresti, D.; Wilfong, G., A fast technique for comparing graph representations with applications to performance evaluation, International Journal on Document Analysis and Recognition, 6, 4, 219-229 (2003)
[23] Amano, A.; Naoki, A., Graph grammar based analysis system of complex table form document, (Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol. 2 (2003)), 916-921
[24] Liang, J.; Doermann, D., Logical labeling of document images using layout graph matching with adaptive learning, (Lopresti, D.; Hu, J.; Kashi, R., Proceedings of the Fifth International Workshop on Document Analysis Systems, Lecture Notes in Computer Science, vol. 2423 (2002), Springer), 224-235 · Zbl 1021.68707
[25] Watanabe, Q. L.T.; Sugie, N., Layout recognition of multi-kinds of table-form documents, IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 4, 432-445 (1995)
[26] Baldi, S.; Marinai, S.; Soda, G., Using tree-grammars for training set expansion in page classification, (Proceedings of the Seventh International Conference on Document Analysis and Recognition (2003)), 829-833
[27] Selkow, S., The tree-to-tree editing problem, Information Processing Letters, 6, 6, 184-186 (1977) · Zbl 0391.68022
[29] Schenker, A.; Bunke, H.; Last, M.; Kandel, A., Graph-Theoretic Techniques for Web Content Mining (2005), World Scientific · Zbl 1077.68007
[30] Schenker, A.; Last, M.; Bunke, H.; Kandel, A., Classification of web documents using graph matching, International Journal of Pattern Recognition and Artificial Intelligence, 18, 3, 475-496 (2004)
[32] Neuhaus, M.; Bunke, H., Bridging the Gap Between Graph Edit Distance and Kernel Machines (2007), World Scientific · Zbl 1140.68473
[33] Vapnik, V., Statistical Learning Theory (1998), John Wiley · Zbl 0935.62007
[34] Shawe-Taylor, J.; Cristianini, N., Kernel Methods for Pattern Analysis (2004), Cambridge University Press
[35] Schö, B.; Smola, A., Learning with Kernels (2002), MIT Press
[36] Gärtner, T., Kernels for Structured Data (2008), World Scientific · Zbl 1168.68039
[37] Gärtner, T., A survey of kernels for structured data, SIGKDD Explorations, 5, 1, 49-58 (2003)
[38] Kondor, R.; Lafferty, J., Diffusion kernels on graphs and other discrete input spaces, (Proceedings of the 19th International Conference on Machine Learning (2002)), 315-322
[39] Kandola, J.; Shawe-Taylor, J.; Cristianini, N., Learning semantic similarity, Neural Information Processing Systems, 15, 657-664 (2003)
[40] Smola, A.; Kondor, R., Kernels and regularization on graphs, (Proceedings of the 16th International Conference on Computational Learning Theory (2003)), 144-158 · Zbl 1274.68351
[41] Lafferty, J.; Lebanon, G., Information diffusion kernels, (Advances in Neural Information Processing Systems, vol. 15 (2003), MIT Press), 375-382
[42] Lafferty, J.; Lebanon, G., Diffusion kernels on statistical manifolds, Journal of Machine Learning Research, 6, 129-163 (2005) · Zbl 1222.68240
[44] Watkins, C., Dynamic alignment kernels, (Smola, A.; Bartlett, P.; Schökopf, B.; Schuurmans, D., Advances in Large Margin Classifiers (2000), MIT Press), 39-50
[46] Borgwardt, K.; Ong, C.; Schö, S.; Vishwanathan, S.; Smola, A.; Kriegel, H.-P., Protein function prediction via graph kernels, Bioinformatics, 21, 1, 47-56 (2005)
[47] Borgwardt, K.; Kriegel, H.-P., Shortest-path kernels on graphs, (Proceedings of the Fifth International Conference on Data Mining (2005)), 74-81
[48] Gärtner, T.; Flach, P.; Wrobel, S., On graph kernels: hardness results and efficient alternatives, (Schölkopf, B.; Warmuth, M., Proceedings of the 16th Annual Conference on Learning Theory (2003)), 129-143 · Zbl 1274.68312
[49] Kashima, H.; Inokuchi, A., Kernels for graph classification, (Proceedings of the ICDM Workshop on Active Mining (2002)), 31-36
[50] Kashima, H.; Tsuda, K.; Inokuchi, A., Marginalized kernels between labeled graphs, (Proceedings of the 20th International Conference on Machine Learning (2003)), 321-328
[51] Ralaivola, L.; Swamidass, S.; Saigo, H.; Baldi, P., Graph kernels for chemical informatics, Neural Networks, 18, 8, 1093-1110 (2005)
[52] Mahé, P.; Ueda, N.; Akutsu, T., Graph kernels for molecular structures—activity relationship analysis with support vector machines, Journal of Chemical Information and Modeling, 45, 4, 939-951 (2005)
[53] Chung-Graham, F., Spectral Graph Theory (1997), AMS
[54] Luo, B.; Wilson, R.; Hancock, E., Spectral embedding of graphs, Pattern Recognition, 36, 10, 2213-2223 (2003) · Zbl 1054.68105
[55] Caelli, T.; Kosinov, S., Inexact graph matching using eigen-subspace projection clustering, International Journal of Pattern Recognition and Artificial Intelligence, 18, 3, 329-355 (2004)
[56] Wilson, R.; Hancock, E.; Luo, B., Pattern vectors from algebraic graph theory, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 7, 1112-1124 (2005)
[57] Robles-Kelly, A.; Hancock, E., A Riemannian approach to graph embedding, Pattern Recognition, 40, 1024-1056 (2007) · Zbl 1119.68210
[58] Luo, B.; Hancock, E., Structural graph matching using the EM algorithm and singular value decomposition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 10, 1120-1136 (2001)
[59] Riesen, K.; Bunke, H., Graph classification based on vector space embedding, International Journal of Pattern Recognition and Artificial Intelligence, 23, 6, 1053-1081 (2009)
[60] Pekalska, E.; Duin, R., The Dissimilarity Representation for Pattern Recognition: Foundations and Applications (2005), World Scientific · Zbl 1095.68105
[61] Spillmann, B.; Neuhaus, M.; Bunke, H.; Pekalska, E.; Duin, R., Transforming strings to vector spaces using prototype selection, (Yeung, D.-Y.; Kwok, J.; Fred, A.; Roli, F.; de Ridder, D., Proceedings of the 11th International Workshop on Structural and Syntactic Pattern Recognition, Lecture Notes in Computer Science, vol. 4109 (2006), Springer), 287-296
[62] Bunke, H.; Allermann, G., Inexact graph matching for structural pattern recognition, Pattern Recognition Letters, 1, 245-253 (1983) · Zbl 0509.68100
[63] Riesen, K.; Bunke, H., Approximate graph edit distance computation by means of bipartite graph matching, Image and Vision Computing, 27, 4, 950-959 (2009)
[64] Sorlin, S.; Solnon, C., Reactive tabu search for measuring graph similarity, (Brun, L.; Vento, M., Proceedings of the Fifth International Workshop on Graph-based Representations in Pattern Recognition, Lecture Notes in Computer Science, vol. 3434 (2005), Springer), 172-182 · Zbl 1119.68420
[65] Neuhaus, M.; Riesen, K.; Bunke, H., Fast suboptimal algorithms for the computation of graph edit distance, (Yeung, D.-Y.; Kwok, J.; Fred, A.; Roli, F.; de Ridder, D., Proceedings of the 11th International Workshop on Structural and Syntactic Pattern Recognition, Lecture Notes in Computer Science, vol. 4109 (2006), Springer), 163-172
[67] Haasdonk, B., Feature space interpretation of SVMs with indefinite kernels, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 4, 482-492 (2005)
[68] Riesen, K.; Bunke, H., Reducing the dimensionality of dissimilarity space embedding graph kernels, Engineering Applications of Artificial Intelligence, 22, 1, 48-56 (2008)
[69] Riesen, K.; Bunke, H., Non-linear transformations of vector space embedded graphs, (Juan-Ciscar, A.; Sanchez-Albaladejo, G., Pattern Recognition in Information Systems (2008)), 173-186
[70] Jiang, X.; Münger, A.; Bunke, H., On median graphs: properties, algorithms, and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 10, 1144-1151 (2001)
[71] Kaufman, L.; Rousseeuw, P., Finding Groups in Data: An Introduction to Cluster Analysis (1990), John Wiley & Sons · Zbl 1345.62009
[72] Schölkopf, B.; Smola, A.; Müller, K.-R., Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation, 10, 1299-1319 (1998)
[73] Riesen, K.; Bunke, H., IAM graph database repository for graph based pattern recognition and machine learning, (da Vitoria Lobo, N.; etal., Proceedings of the International Workshops on Structural Syntactic and Statistical Pattern Recognition, Lecture Notes in Computer Science, vol. 5342 (2008)), 287-297
[75] Dosch, P.; Valveny, E., Report on the second symbol recognition contest, (Liu, W.; Lladós, J., Graphics Recognition. Ten Years Review and Future Perspectives. Proceedings of the Sixth International Workshop on Graphics Recognition, Lecture Notes in Computer Science, vol. 3926 (2005), Springer), 381-397
[76] Riesen, K.; Bunke, H., Kernel \(k\)-means clustering applied to vector space embeddings of graphs, (Prevost, L.; Marinai, S.; Schwenker, F., Proceedings of the Third IAPR Workshop Artificial Neural Networks in Pattern Recognition, Lecture Notes in Artificial Intelligence, vol. 5064 (2008), Springer), 24-35
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.