224s.09.lec7

224s.09.lec7 - CS224S/LINGUIST281...

Info iconThis preview shows pages 1–15. Sign up to view the full content.

View Full Document Right Arrow Icon
CS 224S / LINGUIST 281 Speech Recognition, Synthesis, and  Dialogue Dan Jurafsky Lecture 5: Intro to ASR+HMMs: Forward, Viterbi, Word Error Rate IP Notice:
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Outline for Today Speech Recognition Architectural Overview Hidden Markov Models in general and for  speech Forward Viterbi Decoding How this fits into the ASR component of course July 27 (today): HMMs, Forward, Viterbi, Jan 29 Baum-Welch (Forward-Backward) Feb 3: Feature Extraction, MFCCs Feb 5: Acoustic Modeling and GMMs Feb 10: N-grams and Language Modeling Feb 24: Search and Advanced Decoding Feb 26: Dealing with Variation
Background image of page 2
LVCSR Large Vocabulary Continuous Speech  Recognition ~20,000-64,000 words Speaker independent (vs. speaker- dependent) Continuous speech (vs isolated-word)
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Current error rates Task Vocabulary Error Rate% Digits 11 0.5 WSJ read speech 5K 3 WSJ read speech 20K 3 Broadcast news 64,000+ 10 Conversational Telephone 64,000+ 20 Ballpark numbers; exact numbers depend very much on the specific corpus
Background image of page 4
HSR versus ASR Conclusions: Machines about 5 times worse than humans Gap increases with noisy speech These numbers are rough, take with grain of  salt Task Vocab ASR Hum SR Continuous digits 11 .5 .009 WSJ 1995 clean 5K 3 0.9 WSJ 1995  w/noise 5K 9 1.1 SWBD 2004 65K 20 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
LVCSR Design Intuition Build a statistical model of the speech-to- words process Collect lots and lots of speech, and  transcribe all the words. Train the model on the labeled speech Paradigm: Supervised Machine Learning +  Search
Background image of page 6
The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the  waveform.
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The Noisy Channel Model (II) What is the most likely sentence out of all  sentences in the language L given some  acoustic input O? Treat acoustic input O as sequence of  individual observations  O = o 1 ,o 2 ,o 3 ,…,o t Define a sentence as a sequence of  words: W = w 1 ,w 2 ,w 3 ,…,w n  
Background image of page 8
Noisy Channel Model (III) Probabilistic implication: Pick the highest prob S: We can use Bayes rule to rewrite this: Since denominator is the same for each candidate  sentence W, we can ignore it for the argmax: ˆ W = argma x W L P ( W | O ) ˆ W = argm ax W L P ( O | W ) P ( W ) ˆ W = argmax W L P ( O | W ) P ( W ) P ( O )
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Speech Recognition Architecture
Background image of page 10
Noisy channel model ˆ W = ar gma x W L P ( O | W ) P ( W ) likelihood prior
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The noisy channel model Ignoring the denominator leaves us with  two factors: P(Source) and P(Signal| Source)
Background image of page 12
Speech Architecture meets Noisy  Channel
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Architecture: Five easy pieces (only  2 for today) Feature extraction Acoustic Modeling HMMs,  Lexicons, and Pronunciation Decoding Language Modeling
Background image of page 14
Image of page 15
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/21/2011 for the course CS 224 taught by Professor De during the Spring '11 term at Kentucky.

Page1 / 79

224s.09.lec7 - CS224S/LINGUIST281...

This preview shows document pages 1 - 15. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online