{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

languagemodeling

# 1 wi 1 padd 1 wi wi1 cwi1 v dan jurafsky

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: d evidence, •  otherwise bigram, otherwise unigram •  Interpola1on: •  mix unigram, bigram, trigram •  Interpola*on works be_er Dan Jurafsky Linear Interpola1on •  Simple interpola*on •  Lambdas condi*onal on context: Dan Jurafsky How to set the lambdas? •  Use a held ­out corpus Training Data Held ­Out Data Test Data •  Choose λs to maximize the probability of held ­out data: •  Fix the N ­gram probabili*es (on the training data) •  Then search for λs that give largest probability to held ­out set: log P(w1...wn | M (!1...!k )) = " log PM ( !1... !k ) (wi | wi!1 ) i Dan Jurafsky Unknown words: Open versus closed vocabulary tasks •  If we know all the words in advanced •  Vocabulary V is ﬁxed •  Closed vocabulary task •  OIen we don’t know this •  Out Of Vocabulary = OOV words •  Open vocabulary task •  Instead: create an unknown word token <UNK> •  Training of <UNK> probabili*es •  Create a ﬁxed lexicon L of size V •  At text normaliza*on phase, any training word not in L changed to <UNK> •  Now we train its probabili*es like a normal word •  At decoding *me •  If text input: Use UNK probabili*es for any word not in training Dan Jurafsky Huge web ­scale n ­grams •  How to deal with, e.g., Google N ­gram corpus •  Pruning •  Only store N ­grams with count > threshold. •  Remove singletons of higher ­order n ­grams •  Entropy ­based pruning •  Eﬃciency •  Eﬃcient data structures like tries •  Bloom ﬁlters: approximate language models •  Store words as indexes, not strings •  Use Huﬀman coding to ﬁt large numbers of words into two bytes •  Quan*ze probabili*es (4 ­8 bits instead...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online