Computational CMPT-413 Linguistics Anoop Sarkar http://www.cs.sfu.ca/anoop March 28, 2007 1 / 30 Probabilistic CFG (PCFG) S VP VP PP NP NP NP NP V P NP VP 1 V NP 0.9 VP PP 0.1 P NP 1 NP PP 0.25 Calvin 0.25 monsters 0.25 school 0.25 imagined 1 in 1 tree P (input) = P (tree | input) P (Calvin imagined monsters in school) =? Notice that P (VP V NP ) + P (VP VP PP ) = 1.0 2 / 30 Probabilistic CFG (PCFG) P (Calvin imagined monsters in school) =? (S (NP Calvin) (VP (V imagined) (NP (NP monsters) (PP (P in) (NP school))))) (S (NP Calvin) (VP (VP (V imagined) (NP monsters)) (PP (P in) (NP school)))) 3 / 30 Probabilistic CFG (PCFG) (S (NP Calvin) (VP (V imagined) (NP (NP monsters) (PP (P in) (NP school))))) P (tree1 ) = P (S NP VP ) P (NP Calvin) P (VP V NP ) P (V imagined ) P (NP NP PP ) P (NP monsters ) P (PP P NP ) P (P in) P (NP school ) = 1 0.25 0.9 1 0.25 0.25 1 1 0.25 = .003515625 4 / 30 Probabilistic CFG (PCFG) (S (NP Calvin) (VP (VP (V imagined) (NP monsters)) (PP (P in) (NP school)))) P (tree2 ) = P (S NP VP ) P (NP Calvin) P (VP VP PP ) P (VP V NP ) P (V imagined ) P (NP monsters ) P (PP P NP ) P (P in) P (NP school ) = 1 0.25 0.1 0.9 1 0.25 1 1 0.25 = .00140625 5 / 30 Probabilistic CFG (PCFG) P (Calvin imagined monsters in school) = P (tree1 ) + P (tree2 ) = .003515625 + .00140625 Most likely tree is tree1 (S (NP Calvin) (VP (V imagined) (NP (NP monsters) (PP (P in) (NP school))))) (S (NP Calvin) (VP (VP (V imagined) (NP monsters)) (PP (P in) (NP school)))) 6 / 30 = .004921875 arg max = P (tree | input) tree PCFG Central condition: P (A ) = 1 f (A ,) f (A ) Called a proper PCFG if this condition holds Note that this means P (A ) = P ( | A ) = P (T | S ) = P (T ,S ) P (S ) = P (T , S ) = i P (RHSi | LHSi ) 7 / 30 PCFG What is the PCFG that can be extracted from this single tree: (S (NP (Det the) (NP man)) (VP (VP (V played) (NP (Det a) (NP game))) (PP (P with) (NP (Det the) (NP dog))))) How many different rhs exist for A where A can be S, NP, VP, PP, Det, N, V, P 8 / 30 PCFG S NP NP NP NP VP VP PP Det Det V P NP VP Det NP man game dog VP PP V NP P NP the a played with c c c c c c c c c c c c =1 =3 =1 =1 =1 =1 =1 =1 =2 =1 =1 =1 p p p p p p p p p p p p = 1/1 = 3/6 = 1/6 = 1/6 = 1/6 = 1/2 = 1/2 = 1/1 = 2/3 = 1/3 = 1/1 = 1/1 = 1.0 = 0.5 = 0.1667 = 0.1667 = 0.1667 = 0.5 = 0.5 = 1.0 = 0.67 = 0.33 = 1.0 = 1.0 We can do this with multiple trees. Simply count occurrences of CFG rules over all the trees. A repository of such trees labelled by a human is called a TreeBank. 9 / 30 Ambiguity Part of Speech ambiguity saw noun saw verb Structural ambiguity: Prepositional Phrases I saw (the man) with the telescope I saw (the man with the telescope) Structural ambiguity: Coordination a program to promote safety in ((trucks) and (minivans)) a program to promote ((safety in trucks) and (minivans)) ((a program to promote safety in trucks) and (minivans)) 10 / 30 Ambiguity attachment choice in alternative parses NP NP a program to promote NP safety in VP VP NP PP NP trucks and minivans NP a program NP VP to promote NP safety in PP trucks VP NP and NP minivans 11 / 30 Parsing as a machine learning problem S = a sentence T = a parse tree A statistical parsing model defines P (T | S ) Find best parse: P (T | S ) = Best parse: P (T ,S ) P (S ) arg max T P (T | S ) = P (T , S ) P (T , S ) i =1...n arg max T e.g. for PCFGs: P (T , S ) = P (RHSi | LHSi ) 12 / 30 Prepositional Phrases noun attach: I bought the shirt with pockets verb attach: I washed the shirt with soap As in the case of other attachment decisions in parsing: it depends on the meaning of the entire sentence needs world knowledge, etc. Maybe there is a simpler solution: we can attempt to solve it using heuristics or associations between words 13 / 30 Structure Based Ambiguity Resolution Right association: a constituent (NP or PP) tends to attach to another constituent immediately to its right (Kimball 1973) Minimal attachment: a constituent tends to attach to an existing non-terminal using the fewest additional syntactic nodes (Frazier 1978) These two principles make opposite predictions for prepositional phrase attachment Consider the grammar: VP NP V NP PP NP PP (1) (2) for input: I [VP saw [NP the man . . . [PP with the telescope ], RA predicts that the PP attaches to the NP, i.e. use rule (2), and MA predicts V attachment, i.e. use rule (1) 14 / 30 Structure Based Ambiguity Resolution Garden-paths look structural: The emergency crews hate most is domestic violence Neither MA or RA account for more than 55% of the cases in real text Psycholinguistic experiments using eyetracking show that humans resolve ambiguities as soon as possible in the left to right sequence using the words to disambiguate Garden-paths are caused by a combination of lexical and structural effects: The flowers delivered for the patient arrived 15 / 30 Ambiguity Resolution: Prepositional Phrases in English Learning Prepositional Phrase Attachment: Annotated Data v n1 p n2 Attachment join board as director V is chairman of N.V. N using crocidolite in filters V bring attention to problem V is asbestos in products N making paper for filters N including three with cancer N . . . . . . . . . . . . . . . 16 / 30 Prepositional Phrase Attachment Method Always noun attachment Most likely for each preposition Average Human (4 head words only) Average Human (whole sentence) Accuracy 59.0 72.2 88.2 93.2 17 / 30 Back-off Smoothing Let 1 represent noun attachment. We want to compute probability noun of attachment: p (1 | v , n1, p , n2). Probability of verb attachment is 1 - p (1 | v , n1, p , n2). 18 / 30 Back-off Smoothing ^ 1. If f (v , n1, p , n2) > 0 and p 0.5 f (1, v , n1, p , n2) f (v , n1, p , n2) ^ p (1 | v , n1, p , n2) = 2. Else if f (v , n1, p ) + f (v , p , n2) + f (n1, p , n2) > 0 ^ and p 0.5 ^ p (1 | v , n1, p , n2) = f (1, v , n1, p ) + f (1, v , p , n2) + f (1, n1, p , n2) f (v , n1, p ) + f (v , p , n2) + f (n1, p , n2) f (1, v , p ) + f (1, n1, p ) + f (1, p , n2) f (v , p ) + f (n1, p ) + f (p , n2) f (1, p ) f (p ) 19 / 30 3. Else if f (v , p ) + f (n1, p ) + f (p , n2) > 0 ^ p (1 | v , n1, p , n2) = 4. Else if f (p ) > 0 ^ p (1 | v , n1, p , n2) = ^ 5. Else p (1 | v , n1, p , n2) = 1.0 Prepositional Phrase Attachment: (Collins and Brooks 1995) Results: 84.5% accuracy with the use of some limited word classes for dates, numbers, etc. Using complex word classes taken from WordNet (which we shall be looking at later in this course) increases accuracy to 88% (Stetina and Nagao 1998) We can improve on parsing performance with Probabilistic CFGs by using the insights taken from PP attachment. Modify the PCFG model to be sensitive to words and other context-sensitive features of the input. And generalizing to other kinds of attachment problems, like coordination or deciding which constituent is an argument of a verb. 20 / 30 Some other studies Toutanova, Manning, and Ng, 2004: use sophisticated smoothing model for PP attachment 86.18% with words & stems; with word classes: 87.54% Merlo, Crocker and Berthouzoz, 1997: test on multiple PPs, generalize disambiguation of 1 PP to 2-3 PPs 14 structures possible for 3PPs assuming a single verb: all 14 are attested in the Treebank same model as CB95; but generalized to dealing with upto 3PPs 1PP: 84.3% 2PP: 69.6% 3PP: 43.6% Note that this is still not the real problem faced in parsing natural language 21 / 30 Adding Lexical Information to PCFG S .. VB{indicated} indicated VP{indicated} NP{difference} difference PP{in} P in NP .. 22 / 30 Adding Lexical Information to PCFG (Collins 99, Charniak 00) VP{indicated} VB{+H:indicated} VP{indicated} STOP .. VB{+H:indicated} VP{indicated} VB{+H:indicated} NP{difference} VP{indicated} VB{+H:indicated} .. PP{in} VP{indicated} VB{+H:indicated} .. STOP Ph (VB | VP, indicated) Pl (STOP | VP, VB, indicated) Pr (NP(difference) | VP, VB, indicated) Pr (PP(in) | VP, VB, indicated) Pr (STOP | VP, VB, indicated) 23 / 30 Evaluation of Parsing Consider a candidate parse to be evaluated against the truth (or gold-standard parse): candidate: (S (A (P this) (Q is)) (A (R a) (T test))) gold: (S (A (P this)) (B (Q is) (A (R a) (T test)))) In order to evaluate this, we list all the constituents Candidate (0,4,S) (0,2,A) (2,4,A) Gold (0,4,S) (0,1,A) (1,4,B) (2,4,A) Skip spans of length 1 which would be equivalent to part of speech tagging accuracy. Precision is defined as #correct #proposed = 2 3 and recall as #correct #in gold = 2. 4 Another measure: crossing brackets, candidate: [ an [incredibly expensive] coat ] (1 CB) gold: [ an [incredibly [expensive coat]] 24 / 30 Evaluation of Parsing Bracketing recall R Bracketing precision P Complete match Average crossing No crossing 2 or less crossing = = = = = = num of correct constituents num of constituents in the goldfile num of correct c...

