languagemodeling

# And so on unl we choose s eat chinese then string the

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ally occurs Dan Jurafsky Perplexity The best language model is one that best predicts an unseen test set •  Gives the highest P(sentence) Perplexity is the probability of the test set, normalized by the number of words: Chain rule: For bigrams: Minimizing perplexity is the same as maximizing probability Dan Jurafsky The Shannon Game intui1on for perplexity •  •  From Josh Goodman How hard is the task of recognizing digits ‘0,1,2,3,4,5,6,7,8,9’ •  Perplexity 10 •  How hard is recognizing (30,000) names at MicrosoI. •  Perplexity = 30,000 •  If a system has to recognize •  •  •  •  •  •  Operator (1 in 4) Sales (1 in 4) Technical Support (1 in 4) 30,000 names (1 in 120,000 each) Perplexity is 54 Perplexity is weighted equivalent branching factor Dan Jurafsky Perplexity as branching factor •  Let’s suppose a sentence consis*ng of random digits •  What is the perplexity of this sentence according to a model that assign P=1/10 to each digit? Dan Jurafsky Lower perplexity = beWer model •  Training 38 million words, test 1.5 million words, WSJ N ­gram Unigram Bigram Order Perplexity 962 170 Trigram 109 Language Modeling Evalua*on and Perplexity Language Modeling Generaliza*on and zeros Dan Jurafsky The Shannon Visualiza1on Method •  Choose a random bigram <s> I! (<s>, w) according to its probability Claude Shannon I want! •  Now choose a random bigram want to! (w, x) according to its probability to eat! •  And so on un*l we choose </s> eat Chinese! •  Then string the words...
View Full Document

## This document was uploaded on 02/14/2014.

Ask a homework question - tutors are online