Lecture15 - Introduction to Automatic Speech Recognition...

Info icon This preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Introduction to Automatic Speech Recognition Systems (ASR) Abeer Alwan Speech Processing and Auditory Perception Laboratory Speech Recognition System Overview Language Model Model Training Feature Extraction Decoding Network Acoustic Model Speech Signal Text Output Dictionary
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2 Feature Extraction Feature Extraction Speech wave Frame 1 Frame 2 Feature Vector X 1 Feature Vector X 2 Feature vectors Mel-scale auditory filters Log DCT Mel-Frequency Cepstrum (MFCC) window speech Widely-used feature: MFCCs Usually use the first 13 MFCCs along with their first and second derivatives making it a 39-dimensional vector.
Image of page 2
3 The `Back End’ Primarily probabilistic Most systems use the Hidden Markov Model (HMM) engine Some use Neural Networks Our Lab. at UCLA focuses on ASR robustness to noise and limited data Adaptation (sensitivity to onsets and offsets): modeled after FM experiments Spectral sharpening : physiological and perceptual evidence Not all ‘uniform’ segments are equally - important (VFR) Speaker Normalization: Improving ASR for kids’ speech and accented English Remote ASR
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon