10lm1-handout

10lm1-handout - Massachusetts Institute of Technology...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Massachusetts Institute of Technology Department of Electrical Engineering & Computer Science 6.345/HST.728 Automatic Speech Recognition Spring, 2010 3/30/10 Lecture Handouts • Language Modeling – n-grams – Perplexity – Smoothing • Reading: – Jurafsky et al., Speech and Language Processing , Ch. 4.5 – Chen & Goodman, “An Empirical Study of Smoothing Techniques for Language Modeling,” Computer Speech and Language 13(4), 1999. • Homework: Assignment 3: Language Modeling MIT Language Modeling for Speech Recognition • Introduction • n-gram language models • Perplexity • Smoothing 6.345/HST.728 Automatic Speech Recognition (2010) Language Modeling 1 MIT Why Language Modeling? • To disambiguate acoustically similar utterances using prior knowledge about word sequences – P ( “I swear to tell the the truth’’ ) = 0 . 0023 – P ( “I swerve to smell de soup" ) ≈ 6.345/HST.728 Automatic Speech Recognition (2010) Language Modeling 2 MIT Finite-State Networks me show all the flights display give restaurants • Allowable sequences are defined by a word network or graph • Graph edges or rules can be augmented with probabilities • Finite coverage can be problematic for speech recognition 6.345/HST.728 Automatic Speech Recognition (2010) Language Modeling 3 MIT Context-Free Grammars (CFGs) display the flights N V NP VP D • Can be described by context-free rewrite rules e.g., A = ⇒ BC , A = ⇒ a • More powerful representation than FSNs • Probabilistic CFGs can have associated probabilities • CFGs have finite coverage ; can be disastrous for ASR 6.345/HST.728 Automatic Speech Recognition (2010) Language Modeling 4 MIT Word-Pair Grammars show → me me → all the → flights → the → restaurants • Language space defined by lists of legal word pairs • Can be implemented efficiently within Viterbi search • Still, finite coverage... • A better solution: Allow all possible sequences, with some probabilities 6.345/HST.728 Automatic Speech Recognition (2010) Language Modeling 5 MIT Language Modeling for Speech Recognition • The “fundamental equation’’ of ASR: Speech recognizers seek the most likely word sequence W * = w * 1 ,...,w * K given the acoustic observations A : W * = argmax W P ( W | A ) = argmax W P ( A | W ) P ( W ) • Speech recognition involves acoustic processing, acoustic modeling, language modeling, and search • Statistical language models (LMs) assign a probability estimate P ( W ) to each word sequence W = { w 1 ,...,w K } subject to summationdisplay W P ( W ) = 1 • Language models guide the search among alternative word hypotheses during recognition 6.345/HST.728 Automatic Speech Recognition (2010) Language Modeling 6 MIT History-Based Statistical LMs • P ( W ) can be expanded using the chain rule: P ( W ) = K productdisplay i =1 P ( w i | w 1 ,...,w i- 1 ) = K productdisplay i =1 P ( w i | h i ) where h i = { w 1 ...,w i- 1 } is the word history for w i – Note: Initial and final words are typically taken to be the...
View Full Document

This note was uploaded on 05/08/2010 for the course CS 6.345 taught by Professor Glass during the Spring '10 term at MIT.

Page1 / 16

10lm1-handout - Massachusetts Institute of Technology...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online