This preview shows pages 1–5. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Massachusetts Institute of Technology Department of Electrical Engineering & Computer Science 6.345/HST.728 Automatic Speech Recognition Spring, 2010 4/13/10 Lecture Handouts VQbased HMMs Discriminative Training 4/13/10 1 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 79 Addendum on VQbased systems So far, have modeled the output distribution of a frame y as a mixture distribution Another approach is based on vector quantization (codebooks) Replace y by a label, drawn from some Fnite set Codebooks are generated via clustering Now, the output distribution of a state is a multinomial distribution over these vq labels Let y * be the vq label that replaces the observation y Then the probability of observing vq label v in state i is P ( y * = v  x = i ) = f i ( v ), where v = 1,..., C C is the number of vq labels, and we constrain f i ( v ) v = 1 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 80 Advantages and Disadvantages of VQ Advantages No parametric assumptions about the shape of the output distribution Computationally fast: * Obtain a likelihood by one table lookup Disadvantages Lose resolution in the data representation * Must carve up a 50 dimensional space into a few hundred regions * But can improve matters Multiple codebooks, one for each subset of features This improves resolution but at the expense of introducing further independence assumptions 4/13/10 2 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 81 Estimating output distributions for VQ systems We run BW as usual to obtain a probabilistic alignment. We compute t ( i ) , the probability of frame t being assigned to state i Let I v ( Y * ) = 1 if Y * = v 0 if Y * v so I v is an indicator function for the event that Y * = v Now to estimate the probability of observing label v in state i, we compute f i ( v ) = t ( i ) I v ( Y t * ) t t ( i ) t 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 82 An Introduction to Discriminative Training Larry Gillick 4/13/10 3 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 83 Origin of the Idea There are many estimation methods Maximum likelihood estimation is asymptotically optimal in most situations Originally conjectured by the great statistician Ronald Fisher in 1925 There s been a huge body of theoretical work since then So why should you use any other method? What if the model is wrong! 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 84 A simple example Consider a simple classification problem Observe Y and decide which of two equally likely hypotheses is true H : Y is drawn from class C H 1 : Y is drawn from class C 1 Assume that if Y is from C , then Y ~ N(0,1) if Y is from C 1 , then Y ~ N( ,1) Assume the two error types have equal costs Standard Bayesian reasoning implies the best decision rule is Choose H 1 if Y > /2 Choose H otherwise Since is unknown, must estimate it; use maximum likelihood 4/13/10...
View
Full
Document
This note was uploaded on 05/08/2010 for the course CS 6.345 taught by Professor Glass during the Spring '10 term at MIT.
 Spring '10
 Glass
 Computer Science

Click to edit the document details