10lm2-handout

10lm2-handout - Massachusetts Institute of Technology...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
Massachusetts Institute of Technology 6.345/HST.728 Automatic Speech Recognition Spring, 2010 4/1/10 Lecture Handouts Beyond word n -gram language models Word-class n -grams Phrase-class n -grams Stochastic parsers Log-linear language models Latent semantic analysis Pruning n -grams Homework: Assignment 3: Language Modeling
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
MIT Beyond Word n -gram Language Models Word-class n -grams Phrase-class n -grams Stochastic parsers Log-linear language models Latent semantic analysis Pruning n -grams 6.345/HST.728 Automatic Speech Recognition (2010) Language Modelling 30 MIT Clustering words Many words have similar statistical behavior e.g., days of the week, months, cities, etc. n -gram performance can be improved by clustering words Hard clustering puts a word into a single cluster Soft clustering allows a word to belong to multiple clusters Clusters can be created manually, or automatically Manually created clusters have worked well for small domains Automatic clusters have been created bottom-up or top-down 6.345/HST.728 Automatic Speech Recognition (2010) Language Modelling 31
Background image of page 2
MIT Word Class n -gram models Word class n -grams cluster words into equivalence classes W = { w 1 ,...,w n } → { c 1 ,...,c n } If clusters are non-overlapping, P ( W ) is approximated by P ( W ) n p i =1 P ( w i | c i ) P ( c i | <>,. ..,c i 1 ) Fewer parameters than word n -grams Relatively easy to add new words to existing clusters Can be linearly combined with word n -grams if desired 6.345/HST.728 Automatic Speech Recognition (2010) Language Modelling 32 MIT Bottom-Up Word Clustering Word clusters can be created automatically by forming clusters in a stepwise-optimal or greedy fashion Bottom-up clusters created by considering impact on metric of merging words w a and w b to form new cluster w ab Example metrics for a bigram language model: Minimum decrease in average mutual information I = s i,j P ( w i w j )log 2 P ( w j | w i ) P ( w j ) Minimum increase in training set conditional entropy H = s i,j P ( w i w j )log 2 P ( w j | w i ) 6.345/HST.728 Automatic Speech Recognition (2010) Language Modelling 33
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
MIT Example of Word Clustering A A_M AFTERNOON AIRCRAFT AIRPLANE AMERICAN AN ANY ATLANTA AUGUST AVAILABLE BALTIMORE BE BOOK BOSTON CHEAPEST CITY CLASS COACH CONTINENTAL COST DALLAS DALLAS_FORT_WORTH DAY DELTA DENVER DOLLARS DOWNTOWN EARLIEST EASTERN ECONOMY EIGHT EIGHTY EVENING FARE FARES FIFTEEN FIFTY FIND FIRST_CLASS FIVE FLY FORTY FOUR FRIDAY GET GIVE GO GROUND HUNDRED INFORMATION IT JULY KIND KNOW LATEST LEAST LOWEST MAKE MAY MEAL MEALS MONDAY MORNING MOST NEED NINE NINETY NONSTOP NOVEMBER O+CLOCK OAKLAND OH ONE ONE_WAY P_M PHILADELPHIA PITTSBURGH
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 11

10lm2-handout - Massachusetts Institute of Technology...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online