This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Week 9 Language Processing Probabilistic language models ! Sometimes we want to know “how likely is this string of words W”? ! Example: language model in ASR ! ngram model: Markov assumption • bigram: P(W)=P(w 1 )P(w 2 w 1 )P(w 3 w 2 )P(w 4 w 3 )… ! A language model is a model of language ! Can evaluate model by computing probability of held out test set ! Better model = higher probability Ngrams vs. CFGs ! Ngrams lack structure ! The dog ! The big dog ! The big red dog ! The big red smelly dog ! Plain CFGs don’t assign probabilities ! Solution: add probabilities to CFG rules Probabalistic CFGs ! If W is a word sequence, and T is a tree: ! Why multiple trees? There can be multiple parses of a sentence. ! How to calculate P(T)? If T is headed by rule R: H>S 1 …S n P ( W ) = P ( W , T ) = P ( W  T ) P ( T ) T " T " P ( T ) = P ( R ) P ( S i ) i " H S 1 S 2 …S n Rule probabilities ! If R 1 …R n are the only rules with the same LHS nonterminal, then ! 0.96 S > NP VP ! 0.04 S > VP P ( R i ) i = 1 n " = 1 Example ! Nonterminal rules (P(T)): ! 1 S > NP VP ! 0.8 VP>V NP ! 0.2 VP>V NP PP ! 0.1 NP > NP PP ! 0.9 NP > Det N ! 1 PP>P NP ! Terminal (lexical) rules (P(WT)): ! Det> 0.5 the  0.5 a ! N> 0.4 man  0.3 boy  0.3 binoculars ! V> 1 saw ! P> 1 with The man saw the boy with the binoculars How do you get rule probabilities? ! Use a corpus of text ! Must be specially marked up for parses ! English: Penn Treebank Penn Treebank example ( (S (NPSBJ (NP (NNP Pierre) (NNP Vinken) ) (, ,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (, ,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PPCLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NPTMP (NNP Nov.) (CD 29) ))) (. .) )) ( (S (NPSBJ (NNP Mr.) (NNP Vinken) ) (VP (VBZ is) (NPPRD (NP (NN chairman) ) (PP (IN of) (NP (NP (NNP Elsevier) (NNP N.V.) ) (, ,) (NP (DT the) (NNP Dutch) (VBG publishing) (NN group) ))))) (. .) )) What if you don’t have a treebank? ! Assumption: you still know rules, just not the probabilities ! Insideoutside algorithm ! EM for parsing probabilities ! Like the forwardbackward algorithm in HMMs ! In any EM problem: • What are the observed variables? • What are the hidden variables? Insideoutside & EM ! Start with some random probabilities for each rule ! Estep: determine a probability for each parse ! Same as finding P(T) in “the boy with the telescope” ! Mstep: given parse probabilities from entire corpus, update P(Rules) ! Continue around E/M steps until convergence Unknown structures ! What if you don’t even know rules ahead of time? ! Can infer Chomsky Normal Form rules ! X > Y Z ! X > t ! This becomes a structuralEM problem ! Problems: ! IO algorithm slow (O(n 3 t 3 )), structural EM worse ! Structures learned are often not linguistically plausible ! PCFGs often not good at local dependencies Local dependencies ! The man sees the boy with the binoculars How do you get rule probabilities?...
View
Full
Document
This note was uploaded on 04/13/2010 for the course CSE 730 taught by Professor Ericfoslerlussier during the Fall '08 term at Ohio State.
 Fall '08
 EricFoslerLussier
 Artificial Intelligence

Click to edit the document details