K wi wi1 i dan jurafsky unknown words open versus

Unformatted text preview: 3 allega*ons 2 reports 1 claims 1 request 7 total allegations When we have sparse sta*s*cs: allegations allegations •  Dan Jurafsky Add ­one es1ma1on •  Also called Laplace smoothing •  Pretend we saw each word one more *me than we did •  Just add one to all the counts! c(wi!1, wi ) PMLE (wi | wi!1 ) = c(wi!1 ) •  MLE es*mate: •  Add ­1 es*mate: c(wi!1, wi ) + 1 PAdd !1 (wi | wi!1 ) = c(wi!1 ) + V Dan Jurafsky Maximum Likelihood Es1mates •  The maximum likelihood es*mate •  of some parameter of a model M from a training set T •  maximizes the likelihood of the training set T given the model M •  Suppose the word “bagel” occurs 400 *mes in a corpus of a million words •  What is the probability that a random word from some other text will be “bagel”? •  MLE es*mate is 400/1,000,000 = .004 •  This may be a bad es*mate for some other corpus •  But it is the es1mate that makes it most likely that “bagel” will occur 400 *mes in a million word corpus. Dan Jurafsky Berkeley Restaurant Corpus: Laplace smoothed bigram counts Dan Jurafsky Laplace-smoothed bigrams Dan Jurafsky Reconstituted counts Dan Jurafsky Compare with raw bigram counts Dan Jurafsky Add ­1 es1ma1on is a blunt instrument •  So add ­1 isn’t used for N ­grams: •  We’ll see be_er methods •  But add ­1 is used to smooth other NLP models •  For text classiﬁca*on •  In domains where the number of zeros isn’t so huge. Language Modeling Smoothing: Add ­one (Laplace) smoothing Language Modeling Interpola*on, Backoﬀ, and Web ­Scale LMs Dan Jurafsky Backoff and Interpolation •  Some*mes it helps to use less context •  Condi*on on less context for contexts you haven’t learned much about •  Backoﬀ: •  use trigram if you have goo...
