HMM-handout1

HMM-handout1 - Massachusetts Institute of Technology...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Massachusetts Institute of Technology Department of Electrical Engineering & Computer Science 6.345/HST.728 Automatic Speech Recognition Spring, 2010 4/6/10 Lecture Handouts Hidden Markov Models (HMMs) Reading: Rabiner, A Tutorial on Hidden Markov Models, 1989. 4/2/10 1 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 1 Hidden Markov Models for Speech Recognition and Acoustic Model Training Larry Gillick April 6, 8, and 13, 2010 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 2 ! How to do Speech Recognition Let W = sequence of words Let Y = sequence of frames (acoustics) Our goal is to compute P(W=w|Y=y) for all w Then compute the recognized transcript as follows: w max = arg max w P ( W = w | Y = y ) 4/2/10 2 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 3 The Fundamental Equation of Speech Recognition Actually, it s just Bayes Theorem! P(y|w) is supplied by the acoustic model P(w) is supplied by the language model P(y) is a (largely irrelevant) normalizing constant We must obtain a formula for P(y|w) P ( w | y ) = P ( y | w ) P ( w ) P ( y ) 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 4 Comments on our equation P(w) can be regarded as a prior on word strings P(w|y) is the posterior on word strings, given that we ve seen the acoustics But why don t we model P(w|y) directly? Much harder! P(w|y) is not local: need to consider all possible word strings! P(y|w): focuses on the frame sequence for the single string w 4/2/10 3 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 5 Working in the log domain Probabilities can get very small when carrying out successive multiplies So we rewrite our formula as follows: In these lectures, however, we ll avoid the log domain for reasons of clarity log P ( w | y ) = log P ( y | w ) + log P ( w ) log P ( y ) or, if we ignore the term that is indep of w, S Total ( w ) = S AM ( w ) + S LM ( w ) TotScore = AcModScore + LangModScore 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 6 The Acoustic Model We need a formula for P(y|w), the acoustic model probability This will be provided via the use of a Hidden Markov Model This model makes certain assumptions of dubious validity Independence assumptions Nevertheless, the model has been very successful 4/2/10 4 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 7 Stochastic process A stochastic process is a sequence of random variables or vectors, deFned on the same sample space, either in discrete time: Or in continuous time: { X , X 1 , , X t , } { X t : t 0} 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 8 Markov Chains A Markov Chain is a discrete state and discrete time stochastic process satisfying the so- called Markov Assumption The future depends on the past only through the present We ll assume the X s take on a Fnite set of...
View Full Document

This note was uploaded on 05/08/2010 for the course CS 6.345 taught by Professor Glass during the Spring '10 term at MIT.

Page1 / 21

HMM-handout1 - Massachusetts Institute of Technology...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online