This preview shows pages 1–7. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Speech Recognition Designing HMMbased ASR Systems Lecture Based on: Dr. Rita Singhs Notes School of Computer Science Carnegie Mellon University February 16, 2012 Veton Kpuska 2 Outline Isolated word recognition Bayesian Classification Decoding based on best state sequences Word sequence recognition Bayesian Classification Decoding based on best state sequences Best path decoding and collapsing graphs Search strategies Depth first Breadth first Optimal graph structures Constrained language decoding Natural language decoding HMMs for the language modeling Unigram Bigram Trigram February 16, 2012 Veton Kpuska 3 Statistical Pattern Recognition Given data X , find which of a number of classes C 1 , C 2 ,C N it belongs to, based on known distributions of data from C 1 , C 2 , etc. Bayesian Classification: The a priori probability accounts for the relative proportions of the classes If you never saw any data, you would guess the class based on these probabilities alone P( X C j ) accounts for evidence obtained from observed data X ( 29 ( 29 j j j i C P C X P i C Class  max arg : = = Probability of X as given by the probability distribution C j a priori probability of C j February 16, 2012 Veton Kpuska 4 Statistical Classification of Isolated Words Classes are words Data are instances of isolated spoken words Sequence of feature vectors derived from speech signal, typically 1 vector from a 25ms frame of speech, with frames shifted by 10 ms. Bayesian classification: Recognized_Word = argmax word P( word )P( X  word ) P( word ) is a priori probability of word Obtained from our expectation of the relative frequency of occurrence of the word P( X  word ) is the probability of X computed on the probability distribution function of word February 16, 2012 Veton Kpuska 5 Computing P( X  word ) To compute P( X  word ), there must be a statistical distribution for X corresponding to word Each word must be represented by some statistical model. We represent each word by an HMM An HMM is really a graphical form of probability density function for timevarying February 16, 2012 Veton Kpuska 6 Computing P( X  word ) Each state has a probability distribution function Transitions between states are governed by transition probabilities At each time instant the model is in some state, and it emits one observation vector from the distribution associated with that state The actual state sequence that generated X is never known....
View Full
Document
 Summer '09
 Staff
 King John

Click to edit the document details