This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Week 9 Language Processing Probabilistic language models ! Sometimes we want to know “how likely is this string of words W”? ! Example: language model in ASR ! n-gram model: Markov assumption • bigram: P(W)=P(w 1 )P(w 2 |w 1 )P(w 3 |w 2 )P(w 4 |w 3 )… ! A language model is a model of language ! Can evaluate model by computing probability of held out test set ! Better model = higher probability N-grams vs. CFGs ! N-grams lack structure ! The dog ! The big dog ! The big red dog ! The big red smelly dog ! Plain CFGs don’t assign probabilities ! Solution: add probabilities to CFG rules Probabalistic CFGs ! If W is a word sequence, and T is a tree: ! Why multiple trees? There can be multiple parses of a sentence. ! How to calculate P(T)? If T is headed by rule R: H->S 1 …S n P ( W ) = P ( W , T ) = P ( W | T ) P ( T ) T " T " P ( T ) = P ( R ) P ( S i ) i " H S 1 S 2 …S n Rule probabilities ! If R 1 …R n are the only rules with the same LHS nonterminal, then ! 0.96 S -> NP VP ! 0.04 S -> VP P ( R i ) i = 1 n " = 1 Example ! Non-terminal rules (P(T)): ! 1 S -> NP VP ! 0.8 VP->V NP ! 0.2 VP->V NP PP ! 0.1 NP -> NP PP ! 0.9 NP -> Det N ! 1 PP->P NP ! Terminal (lexical) rules (P(W|T)): ! Det-> 0.5 the | 0.5 a ! N-> 0.4 man | 0.3 boy | 0.3 binoculars ! V-> 1 saw ! P-> 1 with The man saw the boy with the binoculars How do you get rule probabilities? ! Use a corpus of text ! Must be specially marked up for parses ! English: Penn Treebank Penn Treebank example ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (, ,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (, ,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (. .) )) ( (S (NP-SBJ (NNP Mr.) (NNP Vinken) ) (VP (VBZ is) (NP-PRD (NP (NN chairman) ) (PP (IN of) (NP (NP (NNP Elsevier) (NNP N.V.) ) (, ,) (NP (DT the) (NNP Dutch) (VBG publishing) (NN group) ))))) (. .) )) What if you don’t have a treebank? ! Assumption: you still know rules, just not the probabilities ! Inside-outside algorithm ! EM for parsing probabilities ! Like the forward-backward algorithm in HMMs ! In any EM problem: • What are the observed variables? • What are the hidden variables? Inside-outside & EM ! Start with some random probabilities for each rule ! E-step: determine a probability for each parse ! Same as finding P(T) in “the boy with the telescope” ! M-step: given parse probabilities from entire corpus, update P(Rules) ! Continue around E/M steps until convergence Unknown structures ! What if you don’t even know rules ahead of time? ! Can infer Chomsky Normal Form rules ! X -> Y Z ! X -> t ! This becomes a structural-EM problem ! Problems: ! I-O algorithm slow (O(n 3 t 3 )), structural EM worse ! Structures learned are often not linguistically plausible ! PCFGs often not good at local dependencies Local dependencies ! The man sees the boy with the binoculars How do you get rule probabilities?...
View Full Document
This note was uploaded on 04/13/2010 for the course CSE 730 taught by Professor Ericfosler-lussier during the Fall '08 term at Ohio State.
- Fall '08
- Artificial Intelligence