jurafsky&martin_3rdEd_17 (1).pdf

135 improving pcfgs by splitting non terminals lets

Info icon This preview shows pages 223–225. Sign up to view the full content.

13.5 Improving PCFGs by Splitting Non-Terminals Let’s start with the first of the two problems with PCFGs mentioned above: their inability to model structural dependencies, like the fact that NPs in subject position tend to be pronouns, whereas NP s in object position tend to have full lexical (non- pronominal) form. How could we augment a PCFG to correctly model this fact? One idea would be to split the NP non-terminal into two versions: one for sub- Split
Image of page 223

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

224 C HAPTER 13 S TATISTICAL P ARSING jects, one for objects. Having two nodes (e.g., NP subject and NP object ) would allow us to correctly model their different distributional properties, since we would have different probabilities for the rule NP subject ! PRP and the rule NP object ! PRP . One way to implement this intuition of splits is to do parent annotation (John- Parent annotation son, 1998) , in which we annotate each node with its parent in the parse tree. Thus, an NP node that is the subject of the sentence and hence has parent S would be anno- tated NP ˆ S , while a direct object NP whose parent is VP would be annotated NP ˆ VP . Figure 13.8 shows an example of a tree produced by a grammar that parent-annotates the phrasal non-terminals (like NP and VP ). a) S VP NP NN flight DT a VBD need NP PRP I b) S VPˆS NPˆVP NN flight DT a VBD need NPˆS PRP I Figure 13.8 A standard PCFG parse tree (a) and one which has parent annotation on the nodes which aren’t pre-terminal (b). All the non-terminal nodes (except the pre-terminal part-of-speech nodes) in parse (b) have been annotated with the identity of their parent. In addition to splitting these phrasal nodes, we can also improve a PCFG by splitting the pre-terminal part-of-speech nodes (Klein and Manning, 2003b) . For ex- ample, different kinds of adverbs (RB) tend to occur in different syntactic positions: the most common adverbs with ADVP parents are also and now , with VP parents n’t and not , and with NP parents only and just . Thus, adding tags like RBˆADVP, RBˆVP, and RBˆNP can be useful in improving PCFG modeling. Similarly, the Penn Treebank tag IN can mark a wide variety of parts-of-speech, including subordinating conjunctions ( while , as , if ), complementizers ( that , for ), and prepositions ( of , in , from ). Some of these differences can be captured by parent an- notation (subordinating conjunctions occur under S, prepositions under PP), while others require specifically splitting the pre-terminal nodes. Figure 13.9 shows an ex- ample from Klein and Manning (2003b) in which even a parent-annotated grammar incorrectly parses works as a noun in to see if advertising works . Splitting pre- terminals to allow if to prefer a sentential complement results in the correct verbal parse. To deal with cases in which parent annotation is insufficient, we can also hand- write rules that specify a particular node split based on other features of the tree. For example, to distinguish between complementizer IN and subordinating conjunction IN, both of which can have the same parent, we could write rules conditioned on
Image of page 224
Image of page 225
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern