languagemodeling

1 wi 1 padd 1 wi wi1 cwi1 v dan jurafsky

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: d evidence, •  otherwise bigram, otherwise unigram •  Interpola1on: •  mix unigram, bigram, trigram •  Interpola*on works be_er Dan Jurafsky Linear Interpola1on •  Simple interpola*on •  Lambdas condi*onal on context: Dan Jurafsky How to set the lambdas? •  Use a held ­out corpus Training Data Held ­Out Data Test Data •  Choose λs to maximize the probability of held ­out data: •  Fix the N ­gram probabili*es (on the training data) •  Then search for λs that give largest probability to held ­out set: log P(w1...wn | M (!1...!k )) = " log PM ( !1... !k ) (wi | wi!1 ) i Dan Jurafsky Unknown words: Open versus closed vocabulary tasks •  If we know all the words in advanced •  Vocabulary V is fixed •  Closed vocabulary task •  OIen we don’t know this •  Out Of Vocabulary = OOV words •  Open vocabulary task •  Instead: create an unknown word token <UNK> •  Training of <UNK> probabili*es •  Create a fixed lexicon L of size V •  At text normaliza*on phase, any training word not in L changed to <UNK> •  Now we train its probabili*es like a normal word •  At decoding *me •  If text input: Use UNK probabili*es for any word not in training Dan Jurafsky Huge web ­scale n ­grams •  How to deal with, e.g., Google N ­gram corpus •  Pruning •  Only store N ­grams with count > threshold. •  Remove singletons of higher ­order n ­grams •  Entropy ­based pruning •  Efficiency •  Efficient data structures like tries •  Bloom filters: approximate language models •  Store words as indexes, not strings •  Use Huffman coding to fit large numbers of words into two bytes •  Quan*ze probabili*es (4 ­8 bits instead...
View Full Document

This document was uploaded on 02/14/2014.

Ask a homework question - tutors are online