Week9 - Week 9 Language Processing Probabilistic language...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Week 9 Language Processing Probabilistic language models ! Sometimes we want to know “how likely is this string of words W”? ! Example: language model in ASR ! n-gram model: Markov assumption • bigram: P(W)=P(w 1 )P(w 2 |w 1 )P(w 3 |w 2 )P(w 4 |w 3 )… ! A language model is a model of language ! Can evaluate model by computing probability of held out test set ! Better model = higher probability N-grams vs. CFGs ! N-grams lack structure ! The dog ! The big dog ! The big red dog ! The big red smelly dog ! Plain CFGs don’t assign probabilities ! Solution: add probabilities to CFG rules Probabalistic CFGs ! If W is a word sequence, and T is a tree: ! Why multiple trees? There can be multiple parses of a sentence. ! How to calculate P(T)? If T is headed by rule R: H->S 1 …S n P ( W ) = P ( W , T ) = P ( W | T ) P ( T ) T " T " P ( T ) = P ( R ) P ( S i ) i " H S 1 S 2 …S n Rule probabilities ! If R 1 …R n are the only rules with the same LHS nonterminal, then ! 0.96 S -> NP VP ! 0.04 S -> VP P ( R i ) i = 1 n " = 1 Example ! Non-terminal rules (P(T)): ! 1 S -> NP VP ! 0.8 VP->V NP ! 0.2 VP->V NP PP ! 0.1 NP -> NP PP ! 0.9 NP -> Det N ! 1 PP->P NP ! Terminal (lexical) rules (P(W|T)): ! Det-> 0.5 the | 0.5 a ! N-> 0.4 man | 0.3 boy | 0.3 binoculars ! V-> 1 saw ! P-> 1 with The man saw the boy with the binoculars How do you get rule probabilities? ! Use a corpus of text ! Must be specially marked up for parses ! English: Penn Treebank Penn Treebank example ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (, ,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (, ,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (. .) )) ( (S (NP-SBJ (NNP Mr.) (NNP Vinken) ) (VP (VBZ is) (NP-PRD (NP (NN chairman) ) (PP (IN of) (NP (NP (NNP Elsevier) (NNP N.V.) ) (, ,) (NP (DT the) (NNP Dutch) (VBG publishing) (NN group) ))))) (. .) )) What if you don’t have a treebank? ! Assumption: you still know rules, just not the probabilities ! Inside-outside algorithm ! EM for parsing probabilities ! Like the forward-backward algorithm in HMMs ! In any EM problem: • What are the observed variables? • What are the hidden variables? Inside-outside & EM ! Start with some random probabilities for each rule ! E-step: determine a probability for each parse ! Same as finding P(T) in “the boy with the telescope” ! M-step: given parse probabilities from entire corpus, update P(Rules) ! Continue around E/M steps until convergence Unknown structures ! What if you don’t even know rules ahead of time? ! Can infer Chomsky Normal Form rules ! X -> Y Z ! X -> t ! This becomes a structural-EM problem ! Problems: ! I-O algorithm slow (O(n 3 t 3 )), structural EM worse ! Structures learned are often not linguistically plausible ! PCFGs often not good at local dependencies Local dependencies ! The man sees the boy with the binoculars How do you get rule probabilities?...
View Full Document

This note was uploaded on 04/13/2010 for the course CSE 730 taught by Professor Ericfosler-lussier during the Fall '08 term at Ohio State.

Page1 / 12

Week9 - Week 9 Language Processing Probabilistic language...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online