HMM_Speech

HMM_Speech - Speech Recognition and HMM Learning 1 Speech...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Speech Recognition and HMM Learning 1 Speech Recognition and HMM Learning l Overview of speech recognition approaches Standard Bayesian Model Features Acoustic Model Approaches Language Model Decoder Issues l Hidden Markov Models HMM Basics HMM in Speech l Forward, Backward, and Viterbi Algorithms, Models Baum-Welch Learning Algorithm Speech Recognition Challenges l Large Vocabulary Continuous Speaker Independent vs. specialty niches OOV (When is a word Out of Vocabulary) l Background Noise l Different Speakers Pitch, Accent, Speed, etc. l Spontaneous Speech vs. Written Hmm, ah, cough, false starts, non grammatical, etc. l Co-Articulation l Humans demand very high accuracy before using SR Speech Recognition and HMM Learning 2 Standard Approach l Number of possible approaches, but most have converged to a standard overall model Lots of minor variations Right approach or local minimum? l An utterance W consists of a sequence of words w 1, w 2, Different W 's are broken by "silence" thus different lengths l Seek most probable out of all possible W' s, based on Sound input Acoustic Model Reasonable linguistics Language Model Speech Recognition and HMM Learning 3 Standard Bayesian Model l Can drop P ( Y ) l Try all possible W ? Decoder will search through the most likely candidates (type of beam search) Speech Recognition and HMM Learning 4 Features l We assume speech is stationary over some number of milliseconds l Break speech input Y into a sequence of feature vectors y 1, y 2,, yT sampled about every 10ms A five second utterance has 500 feature vectors Usually overlapped with a tapered hamming window (e.g. feature represents about 25ms of time) l Many possibilities Typically use Fourier transform to get into frequency spectrum How much energy is in each frequency bin Somewhat patterned after cochlea in the ear We hear from about 20Hz up to about 20Khz Use Mel Scale bins (ear inspired) which get wider as frequency increases Speech Recognition and HMM Learning 5 MFCC Mel Frequency Cepstral Coefficients Speech Recognition and HMM Learning 6 Features l Most common industry standard is the first 12 cepstral coefficients for a sample, plus the signal energy making 13 basic features. Note: CEPStral are the decorrelated SPECtral coefficients l Also include the first and second derivatives of these features to get input regarding signal dynamics (39 total) l Another approach 26 features (13 MFCC and 1st derivative) 5 different time samples (-6, -3, 0, +3, +6) l HMMs do not require the context snapshot, but they do assume independence, etc....
View Full Document

This document was uploaded on 10/24/2011.

Page1 / 55

HMM_Speech - Speech Recognition and HMM Learning 1 Speech...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online