This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Speech Recognition Hidden Markov Models February 13, 2012 Veton Këpuska 2 Outline Introduction Problem formulation ForwardBackward algorithm Viterbi search BaumWelch parameter estimation Other considerations Multiple observation sequences Phonebased models for continuous speech recognition Continuous density HMMs Implementation issues February 13, 2012 Veton Këpuska 3 Information Theoretic Approach to ASR Statistical Formulation of Speech Recognition A – denotes the acoustic evidence (collection of feature vectors, or data in general) based on which recognizer will make its decision about which words were spoken. W – denotes a string of words each belonging to a fixed and known vocabulary. Speech Producer Acoustic Processor Linguistic Decoder Speaker's Mind Speech Ŵ Speaker Acoustic Channel Speech Recognizer A W February 13, 2012 Veton Këpuska 4 Information Theoretic Approach to ASR Assume that A is a sequence of symbols taken from some alphabet A . W – denotes a string of n words each belonging to a fixed and known vocabulary V . V ,..., , 2 1 ∈ = i m w w w w W A ,..., , 2 1 ∈ = i m a a a a A February 13, 2012 Veton Këpuska 5 Information Theoretic Approach to ASR If P( W  A ) denotes the probability that the words W were spoken, given that the evidence A was observed, then the recognizer should decide in favor of a word string Ŵ satisfying: The recognizer will pick the most likely word string given the observed acoustic evidence. ( 29 A W W W  max arg ˆ P = February 13, 2012 Veton Këpuska 6 Information Theoretic Approach to ASR From the well known Bayes’ rule of probability theory: P( W ) – Probability that the word string W will be uttered P( A  W ) – Probability that when W was uttered the acoustic evidence A will be observed P( A ) – is the average probability that A will be observed: ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 ( 29 A W W A A W  W W A A A W  P P P P P P P P   = = ( 29 ( 29 ( 29 ∑ = ' ' '  W W W A A P P P February 13, 2012 Veton Këpuska 7 Information Theoretic Approach to ASR Since Maximization in: Is carried out with the variable A fixed (e.g., there is not other acoustic data save the one we are give), it follows from Baye’s rule that the recognizer’s aim is to find the word string Ŵ that maximizes the product P( A  W )P( W ) , that is ( 29 A W W W  max arg ˆ P = ( 29 ( 29 W A W W W P P  max arg ˆ = February 13, 2012 Veton Këpuska 8 Hidden Markov Models About Markov Chains: Let X 1 , X 2 , …, X n , … be a sequence of random variables taking their values in the same finite alphabet...
View
Full
Document
This note was uploaded on 02/11/2012 for the course ECE 5526 taught by Professor Staff during the Summer '09 term at FIT.
 Summer '09
 Staff

Click to edit the document details