Ch3-Speech_Signal_Representations

Ch3-Speech_Signal_Representations - Speech Recognition...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Speech Recognition Speech Signal Representations 2/13/12 Veton Këpuska 2 Speech Signal Representations u Fourier Analysis n Discrete-time Fourier transform n Short-time Fourier transform n Discrete Fourier transform u Cepstral Analysis n The complex cepstrum and the cepstrum n Computational considerations n Cepstral analysis of speech n Applications to speech recognition n Mel-Frequency cepstral representation u Performance Comparison of Various Representations 2/13/12 Veton Këpuska 3 W Speech O Block Diagram of Speech Recognition Processing Speech Producer Acoustic Processor Speaker's Mind Ŵ Speaker Acoustic Channel Speech Recognizer Acoustic Model & Lexicon Language Model Decoding Search P(O|W) P(W) 2/13/12 Veton Këpuska 4 Decoding and Search u We can see further details of the operationalization in Figure presented in the next slide, which shows the components of an HMM speech recognizer as it processes a single utterance. u The figure shows the recognition process in three stages. In the feature extraction or signal processing stage, the acoustic waveform is sampled into frames (usually of 10, 15, or 20 milliseconds) which are transformed into spectral features . u Each time window is thus represented by a vector of around 39 features representing this spectral information as well as information about energy and spectral change. 2/13/12 Veton Këpuska 5 Schematic Simplified Architecture of Speech Recognizer 2/13/12 Veton Këpuska 6 Decoding and Search u In the acoustic modeling or phone recognition stage, we compute the likelihood of the observed spectral feature vectors given linguistic units (words, phones, subparts of phones). n For example, we use Gaussian Mixture Model (GMM) classifiers to compute for each HMM state q , corresponding to a phone or subphone, the likelihood of a given feature vector given this phone p ( o | q ). n A (simplified) way of thinking of the output of this stage is as a sequence of probability vectors, one for each time frame, each vector at each time frame containing the likelihoods that each phone or subphone unit generated the acoustic feature vector observation at that time. Feature Extraction: Mel-Filtered Cepstral Coefficient (MFCC) Vectors 2/13/12 Veton Këpuska 8 MFCC Feature Extraction Pre-emphasis Windowing FFT Mel filter-bank log IFFT Energy Deltas MFCC 12 Coefficients MFCC - 12 Coefficients Energy - 1 Coefficient ∆ MFCC – 12 Coefficients ∆ Energy - 1 Coefficient ∆∆ MFCC – 12 Coefficients ∆∆ Energy - 1 Coefficient u Extracting a sequence of 39-dimensional MFCC feature vectors from a quantized digitized waveform 2/13/12 Veton Këpuska 9 Feature Extraction: MFCC Vectors...
View Full Document

This note was uploaded on 02/11/2012 for the course ECE 5526 taught by Professor Staff during the Summer '09 term at FIT.

Page1 / 56

Ch3-Speech_Signal_Representations - Speech Recognition...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online