jurafsky&martin_3rdEd_17 (1).pdf

One way to implement bidirectionality is to switch to

Info icon This preview shows pages 162–164. Sign up to view the full content.

One way to implement bidirectionality is to switch to a much more powerful model called a Conditional Random Field or CRF , which we will introduce in CRF Chapter 20. But CRFs are much more expensive computationally than MEMMs and don’t work any better for tagging, and so are not generally used for this task. Instead, other ways are generally used to add bidirectionality. The Stanford tag- ger uses a bidirectional version of the MEMM called a cyclic dependency network Stanford tagger (Toutanova et al., 2003) . Alternatively, any sequence model can be turned into a bidirectional model by using multiple passes. For example, the first pass would use only part-of-speech fea- tures from already-disambiguated words on the left. In the second pass, tags for all words, including those on the right, can be used. Alternately, the tagger can be run twice, once left-to-right and once right-to-left. In greedy decoding, for each word the classifier chooses the highest-scoring of the tag assigned by the left-to-right and right-to-left classifier. In Viterbi decdoing, the classifier chooses the higher scoring of the two sequences (left-to-right or right-to-left). Multiple-pass decoding is avail- able in publicly available toolkits like the SVMTool system (Gim´enez and Marquez, SVMTool 2004) , a tagger that applies an SVM classifier instead of a MaxEnt classifier at each position, but similarly using Viterbi (or greedy) decoding to implement a sequence model. 10.7 Part-of-Speech Tagging for Other Languages The HMM and MEMM speech tagging algorithms have been applied to tagging in many languages besides English. For languages similar to English, the methods work well as is; tagger accuracies for German, for example, are close to those for English. Augmentations become necessary when dealing with highly inflected or agglutinative languages with rich morphology like Czech, Hungarian and Turkish. These productive word-formation processes result in a large vocabulary for these languages: a 250,000 word token corpus of Hungarian has more than twice as many
Image of page 162

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

10.8 S UMMARY 163 word types as a similarly sized corpus of English (Oravecz and Dienes, 2002) , while a 10 million word token corpus of Turkish contains four times as many word types as a similarly sized English corpus (Hakkani-T¨ur et al., 2002) . Large vocabular- ies mean many unknown words, and these unknown words cause significant per- formance degradations in a wide variety of languages (including Czech, Slovene, Estonian, and Romanian) (Hajiˇc, 2000) . Highly inflectional languages also have much more information than English coded in word morphology, like case (nominative, accusative, genitive) or gender (masculine, feminine). Because this information is important for tasks like pars- ing and coreference resolution, part-of-speech taggers for morphologically rich lan- guages need to label words with case and gender information. Tagsets for morpho- logically rich languages are therefore sequences of morphological tags rather than a single primitive tag. Here’s a Turkish example, in which the word izin has three pos-
Image of page 163
Image of page 164
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern