jurafsky&martin_3rdEd_17 (1).pdf

Unfortunately the large number of lexical categories

Info icon This preview shows pages 232–234. Sign up to view the full content.

View Full Document Right Arrow Icon
Unfortunately, the large number of lexical categories available for each word, combined with the promiscuity of CCG’s combinatoric rules, leads to an explosion in the number of (mostly useless) constituents added to the parsing table. The key to managing this explosion of zombie constituents is to accurately assess and ex- ploit the most likely lexical categories possible for each word — a process called supertagging. The following sections describe two approaches to CCG parsing that make use of supertags. Section 13.7.4 , presents an approach that structures the parsing process as a heuristic search through the use of the A* algorithm. The following section then briefly describes a more traditional maximum entropy approach that manages the search space complexity through the use of adaptive supertagging — a process that iteratively considers more and more tags until a parse is found. 13.7.3 Supertagging Chapter 10 introduced the task of part-of-speech tagging, the process of assigning the correct lexical category to each word in a sentence. Supertagging is the corre- Supertagging sponding task for highly lexicalized grammar frameworks, where the assigned tags
Image of page 232

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
13.7 P ROBABILISTIC CCG P ARSING 233 often dictate much of the derivation for a sentence. Indeed, ) refer to supertagging as almost parsing . CCG supertaggers rely on treebanks such as CCGbank to provide both the over- all set of lexical categories as well as the allowable category assignments for each word in the lexicon. CCGbank includes over 1000 lexical categories, however, in practice, most supertaggers limit their tagsets to those tags that occur at least 10 times in the training corpus. This results in an overall total of around 425 lexical categories available for use in the lexicon. Note that even this smaller number is large in contrast to the 45 POS types used by the Penn Treebank tagset. As with traditional part-of-speech tagging, the standard approach to building a CCG supertagger is to use supervised machine learning to build a sequence classi- fier using labeled training data. A common approach is to use the maximum entropy Markov model (MEMM), as described in Chapter 10, to find the most likely se- quence of tags given a sentence. The features in such a model consist of the current word w i , its surrounding words within l words w i + l i - l , as well as the k previously as- signed supertags t i - 1 i - k . This type of model is summarized in the following equation from Chapter 10. Training by maximizing log-likelihood of the training corpus and decoding via the Viterbi algorithm are the same as described in Chapter 10. ˆ T = argmax T P ( T | W ) = argmax T Y i P ( t i | w i + l i - l , t i - 1 i - k ) = argmax T Y i exp X i w i f i ( t i , w i + l i - l , t i - 1 i - k ) ! X t 0 2 tagset exp X i w i f i ( t 0 , w i + l i - l , t i - 1 i - k ) !
Image of page 233
Image of page 234
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern