languagemodeling

Cwi1 wi pmle wi wi1 cwi1 mle esmate add1

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: together Chinese food! food I want to eat Chinese food! </s>! Dan Jurafsky Approxima1ng Shakespeare Dan Jurafsky Shakespeare as corpus •  N=884,647 tokens, V=29,066 •  Shakespeare produced 300,000 bigram types out of V2= 844 million possible bigrams. •  So 99.96% of the possible bigrams were never seen (have zero entries in the table) •  Quadrigrams worse: What's coming out looks like Shakespeare because it is Shakespeare Dan Jurafsky The wall street journal is not shakespeare (no offense) Dan Jurafsky The perils of overfi]ng •  N ­grams only work well for word predic*on if the test corpus looks like the training corpus •  In real life, it oIen doesn’t •  We need to train robust models that generalize! •  One kind of generaliza*on: Zeros! •  Things that don’t ever occur in the training set •  But occur in the test set Dan Jurafsky Zeros •  Test set •  Training set: … denied the allega*ons … denied the offer … denied the loan … denied the reports … denied the claims … denied the request P(“offer” | denied the) = 0 Dan Jurafsky Zero probability bigrams •  Bigrams with zero probability •  mean that we will assign 0 probability to the test set! •  And hence we cannot compute perplexity (can’t divide by 0)! Language Modeling Generaliza*on and zeros Language Modeling Smoothing: Add ­one (Laplace) smoothing Dan Jurafsky The intuition of smoothing (from Dan Klein) man outcome man outcome attack … … P(w | denied the) 2.5 allega*ons 1.5 reports 0.5 claims 0.5 request 2 other 7 total attack Steal probability mass to generalize be_er request •  claims request claims reports reports P(w | denied the)...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online