224s.09.lec11

224s.09.lec11 - CS124/LINGUIST180:From...

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon
CS 124/LINGUIST 180: From  Language to Information Dan Jurafsky Lecture 3: Intro to Probability,   Language Modeling IP notice: some slides for today from: Jim Martin, Sandiway Fong, Dan Klein
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Outline Probability Basic probability Conditional probability Language Modeling (N-grams) N-gram Intro The Chain Rule The Shannon Visualization Method Evaluation: Perplexity Smoothing:  Laplace (Add-1) Add-prior 2
Background image of page 2
Language Modeling We want to compute  P(w 1 ,w 2 ,w 3 ,w 4 ,w 5 …w n ) = P(W) = the probability of a sequence Alternatively we want to compute  P(w 5 |w 1 ,w 2 ,w 3 ,w 4 ) =the probability of a word given some previous words The model that computes  P(W) or P(w n |w 1 ,w 2 …w n-1 ) is called the  language model . A better term for this would be “The Grammar” But “Language model” or LM is standard 3
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Computing P(W) How to compute this joint probability: P(“the”,”other”,”day”,”I”,”was”,”walking”,”along” ,”and”,”saw”,”a”,”lizard”) Intuition: let’s rely on the Chain Rule of  Probability 4
Background image of page 4
The Chain Rule Recall the definition of conditional probabilities Rewriting: More generally P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C) In general  P(x 1 ,x 2 ,x 3 ,…x n ) = P(x 1 )P(x 2 |x 1 )P(x 3 |x 1 ,x 2 )…P(x n | x 1 …x n-1 ) ) ( ) ^ ( ) | ( B P B A P B A P = ) ( ) | ( ) ^ ( B P B A P B A P = 5
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The Chain Rule applied to joint  probability of words in sentence P(“the big red dog was”)= P(the) * P(big|the) * P(red|the big) * P(dog|the  big red) * P(was|the big red dog) 6
Background image of page 6
Very easy estimate: How to estimate? P(the | its water is so transparent that) P(the | its water is so transparent that) = C(its water is so transparent that the) ___________________________________________________________________________________ C(its water is so transparent that) 7
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Unfortunately There are a lot of possible sentences We’ll never be able to get enough data to  compute the statistics for those long prefixes P(lizard| the,other,day,I,was,walking,along,and,saw,a) Or P(the|its water is so transparent that) 8
Background image of page 8
Markov Assumption Make the simplifying assumption P(lizard| the,other,day,I,was,walking,along,and,saw,a)  = P(lizard|a) Or maybe P(lizard| the,other,day,I,was,walking,along,and,saw,a)  = P(lizard|saw,a) 9
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
So for each component in the product replace with  the approximation (assuming a prefix of N)  Bigram version P ( w n | w 1 n - 1 ) P ( w n | w n - N + 1 n - 1 ) Markov Assumption P ( w n | w 1 n - 1 ) P ( w n | w n - 1 ) 10
Background image of page 10
Estimating bigram probabilities The Maximum Likelihood Estimate P ( w i | w i - 1 ) = cou nt ( w i - 1 , w i ) cou nt ( w i - 1 ) P ( w i | w i - 1 ) = c ( w i - 1 , w i ) c ( w i - 1 ) 11
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 12
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/21/2011 for the course CS 224 taught by Professor De during the Spring '11 term at Kentucky.

Page1 / 65

224s.09.lec11 - CS124/LINGUIST180:From...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online