This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger Kristina Toutanova Dept of Computer Science Gates Bldg 4A, 353 Serra Mall Stanford, CA 94305–9040, USA [email protected] Christopher D. Manning Depts of Computer Science and Linguistics Gates Bldg 4A, 353 Serra Mall Stanford, CA 94305–9040, USA [email protected] Abstract This paper presents results for a maximum- entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitaliza- tion for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words. Introduction 1 There are now numerous systems for automatic assignment of parts of speech (“tagging”), employing many different machine learning methods. Among recent top performing methods are Hidden Markov Models (Brants 2000), maximum entropy approaches (Ratnaparkhi 1996), and transformation-based learning (Brill 1994). An overview of these and other approaches can be found in Manning and Schütze (1999, ch. 10). However, all these methods use largely the same information sources for tagging, and often almost the same features as well, and as a consequence they also offer very similar levels of performance. This stands in contrast to the (manually-built) EngCG tagger, which achieves better performance by using lexical and contextual information sources and generalizations beyond those available to such statistical taggers, as Samuelsson and Voutilainen (1997) demonstrate. 1 We thank Dan Klein and Michael Saunders for useful discussions, and the anonymous reviewers for many helpful comments. This paper explores the notion that automat- ically built tagger performance can be further improved by expanding the knowledge sources available to the tagger. We pay special attention to unknown words, because the markedly lower accuracy on unknown word tagging means that this is an area where significant performance gains seem possible. We adopt a maximum entropy approach because it allows the inclusion of diverse sources of information without causing frag- mentation and without necessarily assuming independence between the predictors. A maxi- mum entropy approach has been applied to part- of-speech tagging before (Ratnaparkhi 1996), but the approach’s ability to incorporate non- local and non-HMM-tagger-type evidence has not been fully explored. This paper describes the models that we developed and the experiments we performed to evaluate them....
View Full Document
This note was uploaded on 10/18/2011 for the course CS 479 taught by Professor Ericringger during the Fall '11 term at BYU.
- Fall '11
- Machine Learning