×

zbMATH — the first resource for mathematics

Language identification for handwritten document images using a shape codebook. (English) Zbl 1187.68521
Summary: Language identification for handwritten document images is an open document analysis problem. In this paper, we propose a novel approach to language identification for documents containing mixture of handwritten and machine printed text using image descriptors constructed from a codebook of shape features. We encode local text structures using scale and rotation invariant codewords, each representing a segmentation-free shape feature that is generic enough to be detected repeatably. We learn a concise, structurally indexed shape codebook from training by clustering and partitioning similar feature types through graph cuts. Our approach is easily extensible and does not require skew correction, scale normalization, or segmentation. We quantitatively evaluate our approach using a large real-world document image collection, which is composed of 1512 documents in eight languages (Arabic, Chinese, English, Hindi, Japanese, Korean, Russian, and Thai) and contains a complex mixture of handwritten and machine printed content. Experiments demonstrate the robustness and flexibility of our approach, and show exceptional language identification performance that exceeds the state of the art.
MSC:
68T10 Pattern recognition, speech recognition
Software:
IAM; LIBSVM; Matlab; Octave
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] L. Vincent, Google Book Search: document understanding on a massive scale, in: Proceedings of the International Conference on Document Analysis and Recognition, 2007, pp. 819-823.
[2] G. Zhu, T.J. Bethea, V. Krishna, Extracting relevant named entities for automated expense reimbursement, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007, pp. 1004-1012.
[3] Rice, S.; Nagy, G.; Nartker, T., Optical character recognition: an illustrated guide to the frontier, (1999), Kluwer Academic Publishers Dordrecht
[4] Li, Y.; Zheng, Y.; Doermann, D.; Jaeger, S., Script-independent text line segmentation in freestyle handwritten documents, IEEE transactions on pattern analysis and machine intelligence, 30, 8, 1313-1329, (2008)
[5] U. Marti, H. Bunke, The IAM-database: an English sentence database for off-line handwriting recognition, International Journal of Document Analysis and Recognition 5 (2006) 39-46, Available: \(\langle\)http://www.iam.unibe.ch/∼fki/iamDB/⟩. · Zbl 1039.68045
[6] Plamondon, R.; Srihari, S.N., On-line and off-line handwriting recognition: a comprehensive survey, IEEE transactions on pattern analysis and machine intelligence, 22, 1, 63-84, (2000)
[7] Hochberg, J.; Bowers, K.; Cannon, M.; Kelly, P., Script and language identification for handwritten document images, International journal of document analysis and recognition, 2, 2-3, 45-52, (1999)
[8] D.-S. Lee, C.R. Nohl, H.S. Baird, Language identification in complex, unoriented, and degraded document images, in: Proceedings of the IAPR Workshop on Document Analysis Systems, 1996, pp. 17-39.
[9] Spitz, A., Determination of script and language content of document images, IEEE transactions on pattern analysis and machine intelligence, 19, 3, 235-245, (1997)
[10] J. Ding, L. Lam, C.Y. Suen, Classification of oriental and European scripts by using characteristic features, in: Proceedings of the International Conference on Document Analysis and Recognition, 1997, pp. 1023-1027.
[11] C.Y. Suen, S. Bergler, N. Nobile, B. Waked, C. Nadal, A. Bloch, Categorizing document images into script and language classes, in: Proceedings of the International Conference on Advances in Pattern Recognition, 1998, pp. 297-306.
[12] Lu, S.; Tan, C.L., Script and language identification in noisy and degraded document images, IEEE transactions on pattern analysis and machine intelligence, 30, 2, 14-24, (2008)
[13] Tan, T., Rotation invariant texture features and their use in automatic script identification, IEEE transactions on pattern analysis and machine intelligence, 20, 7, 751-756, (1998)
[14] Busch, A.; Boles, W.W.; Sridharan, S., Texture for script identification, IEEE transactions on pattern analysis and machine intelligence, 27, 11, 1720-1732, (2005)
[15] Hochberg, J.; Kelly, P.; Thomas, T.; Kerns, L., Automatic script identification from document images using cluster-based templates, IEEE transactions on pattern analysis and machine intelligence, 19, 2, 176-181, (1997)
[16] H. Ma, D. Doermann, Word level script identification on scanned document images, in: Proceedings of the Document Recognition and Retrieval, 2004, pp. 124-135.
[17] S. Jaeger, H. Ma, D. Doermann, Identifying script on word-level with informational confidence, in: Proceedings of the International Conference on Document Analysis and Recognition, 2005, pp. 416-420.
[18] Ferrari, V.; Fevrier, L.; Jurie, F.; Schmid, C., Groups of adjacent contour segments for object detection, IEEE transactions on pattern analysis and machine intelligence, 30, 1, 36-51, (2008)
[19] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 886-893.
[20] L. Schomaker, M. Bulacu, K. Franke, Automatic writer identification using fragmented connected-component contours, in: Proceedings of the International Workshop on Frontiers in Handwriting Recognition, 2004, pp. 185-190.
[21] Kohonen, T., Self-organization and associative memory, (1989), Springer Berlin · Zbl 0528.68062
[22] X. Yu, Y. Li, C. Fermuller, D. Doermann, Object detection using a shape codebook, in: Proceedings of the British Machine Vision Conference, 2007, pp. 1-10.
[23] Canny, J., A computational approach to edge detection, IEEE transactions on pattern analysis and machine intelligence, 8, 6, 679-697, (1986)
[24] P.D. Kovesi, MATLAB and octave functions for computer vision and image processing, 2000. Available: \(\langle\)http://www.csse.uwa.edu.au/∼pk/research/matlabfns/⟩.
[25] G. Zhu, X. Yu, Y. Li, D. Doermann, Learning visual shape lexicon for document image content recognition, in: Proceedings of the European Conference on Computer Vision, vol. 2, 2008, pp. 745-758.
[26] Shi, J.; Malik, J., Normalized cuts and image segmentation, IEEE transactions on pattern analysis and machine intelligence, 22, 8, 888-905, (2000)
[27] S.X. Yu, J. Shi, Multiclass spectral clustering, in: Proceedings of the International Conference on Computer Vision, 2003, pp. 11-17.
[28] Ojala, T.; Pietikainen, M.; Maenpaa, T., Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE transactions on pattern analysis and machine intelligence, 24, 7, 971-987, (2002)
[29] Heikkila, M.; Pietikainen, M., A texture-based method for modeling the background and detecting moving objects, IEEE transactions on pattern analysis and machine intelligence, 28, 4, 657-662, (2006)
[30] Ahonen, T.; Hadid, A.; Pietikainen, M., Face description with local binary patterns: application to face recognition, IEEE transactions on pattern analysis and machine intelligence, 28, 12, 2037-2041, (2006)
[31] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, 2001. Available: \(\langle\)http://www.csie.ntu.edu.tw/∼cjlin/libsvm⟩.
[32] G. Zhu, X. Yu, Y. Li, D. Doermann, Unconstrained language identification using a shape codebook, in: Proceedings of the International Conference on Frontiers in Handwriting Recognition, 2008, pp. 13-18.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.