HMM-handout2

HMM-handout2 - Massachusetts Institute of Technology...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Massachusetts Institute of Technology Department of Electrical Engineering & Computer Science 6.345/HST.728 Automatic Speech Recognition Spring, 2010 4/8/10 Lecture Handouts HMM Training Homework: Hidden Markov Models 4/2/10 21 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 41 Training an HMM-based Speech Recognition System Larry Gillick 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 42 Training the Acoustic Model via Maximum Likelihood Estimation (MLE) Observe a series of utterances with transcription W and frames Y ( W 1 , Y 1 ), ,( W M , Y M ) Let represent the (unknown) parameters of the acoustic model Let P (y | w) be the distribution of the frame sequence given W We shall estimate as follows : MLE = arg max P i = 1 M ( y i | w i ) 4/2/10 22 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 43 What data do we need? Dictionary of words Pronunciation: string of phonemes Set of recorded utterances Transcription: string of words Initial set of acoustic models Not required but useful 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 44 Some assumptions We have a set of output distributions and transition probabilities (duration distributions) We ll assume that the states have already been clustered somehow Let s also assume the output distributions are single Gaussians or mixtures of Gaussians Output distributions f i ( y ) Transition probabilities T ( i , j ) 4/2/10 23 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 45 What must be estimated? For each output distribution, need to estimate a set of mean vectors, a set of (usually diagonal) covariance matrices, and mixture probabilities Must also estimate the corresponding transition probabilities We ll focus on the output distributions 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 46 How can we estimate these quantities? Align frames in each utterance to the corresponding output distributions Collect all of the frames (from all the training utts) assigned to each output distribution Single Gaussian Compute the means and variances of the assigned frames for each distribution Mixture of Gaussians Estimate the mixture distribution via the EM algorithm, which we ll return to later 4/2/10 24 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 47 Two alignment methods Viterbi algorithm Deterministic alignment of training utterances Each frame will be assigned to a unique output distribution * Based on the maximum likelihood state sequence Baum-Welch algorithm (forward-backward algorithm) Probabilistic alignment of training utterances Each frame will be distributed across (possibly multiple) output distributions 6.345/HST.728 Automatic Speech Recognition (2010) HMMs 48 Viterbi Algorithm We wish to determine the most likely (ML) state sequence corresponding to the observed frames y x ML = arg max x P ( x , y ) We can perform this computation via dynamic programming...
View Full Document

Page1 / 20

HMM-handout2 - Massachusetts Institute of Technology...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online