This preview shows pages 1–6. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Massachusetts Institute of Technology Department of Electrical Engineering & Computer Science 6.345/HST.728 Automatic Speech Recognition Spring, 2010 4/8/10 Lecture Handouts HMM Training Homework: Hidden Markov Models 4/2/10 21 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 41 Training an HMMbased Speech Recognition System Larry Gillick 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 42 Training the Acoustic Model via Maximum Likelihood Estimation (MLE) Observe a series of utterances with transcription W and frames Y ( W 1 , Y 1 ), ,( W M , Y M ) Let represent the (unknown) parameters of the acoustic model Let P (y  w) be the distribution of the frame sequence given W We shall estimate as follows : MLE = arg max P i = 1 M ( y i  w i ) 4/2/10 22 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 43 What data do we need? Dictionary of words Pronunciation: string of phonemes Set of recorded utterances Transcription: string of words Initial set of acoustic models Not required but useful 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 44 Some assumptions We have a set of output distributions and transition probabilities (duration distributions) We ll assume that the states have already been clustered somehow Let s also assume the output distributions are single Gaussians or mixtures of Gaussians Output distributions f i ( y ) Transition probabilities T ( i , j ) 4/2/10 23 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 45 What must be estimated? For each output distribution, need to estimate a set of mean vectors, a set of (usually diagonal) covariance matrices, and mixture probabilities Must also estimate the corresponding transition probabilities We ll focus on the output distributions 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 46 How can we estimate these quantities? Align frames in each utterance to the corresponding output distributions Collect all of the frames (from all the training utts) assigned to each output distribution Single Gaussian Compute the means and variances of the assigned frames for each distribution Mixture of Gaussians Estimate the mixture distribution via the EM algorithm, which we ll return to later 4/2/10 24 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 47 Two alignment methods Viterbi algorithm Deterministic alignment of training utterances Each frame will be assigned to a unique output distribution * Based on the maximum likelihood state sequence BaumWelch algorithm (forwardbackward algorithm) Probabilistic alignment of training utterances Each frame will be distributed across (possibly multiple) output distributions 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 48 Viterbi Algorithm We wish to determine the most likely (ML) state sequence corresponding to the observed frames y x ML = arg max x P ( x , y ) We can perform this computation via dynamic programming...
View Full
Document
 Spring '10
 Glass
 Computer Science

Click to edit the document details