×

Shallow parsing with PoS taggers and linguistic features. (English) Zbl 1033.68125

Summary: Three data-driven publicly available part-of-speech taggers are applied to shallow parsing of Swedish texts. The phrase structure is represented by nine types of phrases in a hierarchical structure containing labels for every constituent type the token belongs to in the parse tree. The encoding is based an the concatenation of the phrase tags an the path from lowest to higher nodes. Various linguistic features are used in learning; the taggers are trained an the basis of lexical information only, part-of-speech only, and a combination of both, to predict the phrase structure of the tokens with or without part-of-speech. Special attention is directed to the taggers’ sensitivity to different types of linguistic information included in learning, as well as the taggers’ sensitivity to the size and the various types of training data sets. The method can be easily transferred to other languages.

MSC:

68T50 Natural language processing
68T05 Learning and adaptive systems in artificial intelligence

Software:

MBT; Penn Treebank
PDFBibTeX XMLCite
Full Text: DOI