{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

HMM-handout3

HMM-handout3 - Massachusetts Institute of Technology...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
Massachusetts Institute of Technology Department of Electrical Engineering & Computer Science 6.345/HST.728 Automatic Speech Recognition Spring, 2010 4/13/10 Lecture Handouts VQ-based HMMs Discriminative Training
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
4/13/10 1 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 79 Addendum on VQ-based systems So far, have modeled the output distribution of a frame y as a mixture distribution Another approach is based on vector quantization (codebooks) Replace y by a label, drawn from some finite set Codebooks are generated via clustering Now, the output distribution of a state is a multinomial distribution over these vq labels Let y * be the vq label that replaces the observation y Then the probability of observing vq label v in state i is P ( y * = v | x = i ) = f i ( v ), where v =1,..., C C is the number of vq labels, and we constrain f i ( v ) v =1 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 80 Advantages and Disadvantages of VQ Advantages No parametric assumptions about the shape of the output distribution Computationally fast: * Obtain a likelihood by one table look-up Disadvantages Lose resolution in the data representation * Must carve up a 50 dimensional space into a few hundred regions * But can improve matters Multiple codebooks, one for each subset of features This improves resolution but at the expense of introducing further independence assumptions
Background image of page 2
4/13/10 2 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 81 Estimating output distributions for VQ systems We run BW as usual to obtain a probabilistic alignment. We compute γ t ( i ) , the probability of frame t being assigned to state i Let I v ( Y * ) = 1 if Y * = v 0 if Y * v so I v is an indicator function for the event that Y * = v Now to estimate the probability of observing label v in state i, we compute f i ( v ) = γ t ( i ) I v ( Y t * ) t γ t ( i ) t 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 82 An Introduction to Discriminative Training Larry Gillick
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
4/13/10 3 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 83 Origin of the Idea There are many estimation methods Maximum likelihood estimation is asymptotically optimal in most situations Originally conjectured by the great statistician Ronald Fisher in 1925 There ʼ s been a huge body of theoretical work since then So why should you use any other method? What if the model is wrong! 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 84 A simple example Consider a simple classification problem Observe Y and decide which of two equally likely hypotheses is true H 0 : Y is drawn from class C 0 H 1 : Y is drawn from class C 1 Assume that if Y is from C 0 , then Y ~ N(0,1) if Y is from C 1 , then Y ~ N( μ ,1) Assume the two error types have equal costs Standard Bayesian reasoning implies the best decision rule is Choose H 1 if Y > μ /2 Choose H 0 otherwise Since μ is unknown, must estimate it; use maximum likelihood
Background image of page 4
4/13/10 4 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 85 Not so simple?
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}