Tan, Chew Lim; Huang, Weihua; Sung, Sam Yuan; Yu, Zhaohui; Xu, Yi Text retrieval from document images based on word shape analysis. (English) Zbl 1034.68031 Appl. Intell. 18, No. 3, 257-270 (2003). Summary: We propose a method of text retrieval from document images using a similarity measure based on word shape analysis. We directly extract image features instead of using optical character recognition. Document images are segmented into word units and then features called vertical bar patterns are extracted from these word units through local extrema points detection. All vertical bar patterns are used to build document vectors. Lastly, we obtain the pair-wise similarity of document images by means of the scalar product of the document vectors. Four corpora of news articles were used to test the validity of our method. During the test, the similarity of document images using this method was compared with the result of ASCII version of those documents based on the N-gram algorithm for text documents. Cited in 1 Document MSC: 68P20 Information storage and retrieval of data 68U10 Computing methodologies for image processing Keywords:document image analysis; text retrieval; similarity measure; document vector PDFBibTeX XMLCite \textit{C. L. Tan} et al., Appl. Intell. 18, No. 3, 257--270 (2003; Zbl 1034.68031) Full Text: DOI