Lecture17-speechreco2 - 10/8/2011 This work is licensed...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
10/8/2011 1 CS 479, section 1: Natural Language Processing Lecture #17: Speech Recognition Overview (cont.) Thanks to Alex Acero (Microsoft Research), Mazin Rahim (AT&T Research), Dan Klein (UC Berkeley), Jeff Adams (Nuance), & Simon Arnfield (Sheffield) for many of the materials used in this lecture. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . Announcements Project #1, Part 2 Early: today Due: Monday Questions? Objectives Continue our overview of an approach to speech recognition, picking up at acoustic modeling See other examples of the source / channel (noisy channel) paradigm for modeling interesting processes Apply language models Front End We want to predict a sentence ݓ given a feature vector ݕൌܨܧሺݔሻ Update to the problem: *a r g m a x()(|) w wP w P y w Source Noisy Channel Text ASR Speech Text 1 2 ... n x xx x 12 *. . . m ww w () Pw (|) Pxw FE Features 1 2 ... q yy Acoustic Modeling Goal: Map acoustic features into distinct sub word units Such as phones, syllables, words, etc. Hidden Markov Model (HMM): Spectral properties modeled by a parametric random process (a directed graphical model!) A collection of HMMs is associated with a sub word unit HMMs are also assigned for modeling extraneous events (cough, um, sneeze, …) More on HMMs coming up in the course! Advantages: Powerful statistical method for a wide range of data and conditions Highly reliable for recognizing speech Feature Extraction Decoder, Search Language Model Acoustic Model Word Lexicon Acoustic Trajectories EE ee ch zh EY ih sh j dh y IH z s t d un OOH eh th g h ng k f OO AE AA p n v b EH wr um UH UR ur OY AY ul loh uh AWH OH AH OW AW
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
10/8/2011 2 Acoustic Models: Neighborhoods are not Points How do we describe what points in our “feature space” are likely to come from a given phoneme? It’s clearly more complicated than just identifying a single point. Also, the boundaries are not “clean”. Use the normal distribution: Points are likely to lie near the center. We describe the distribution with the mean & variance. Easy to compute with Acoustic Models: Neighborhoods are not Points (2) Normal distributions in M dimensions are analogous A.k.a. “Gaussians” Specify the mean point in M dimensions Like an M dimensional “hill” centered around the mean point Specify the variances (as Co variance matrix) Diagonal gives the “widths” of the distribution in each direction Off diagonal values describe the “orientation” “Full covariance” possibly “tilted” “Diagonal covariance” not “tilted” AMs: Gaussians don’t really cut it
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/18/2011 for the course CS 479 taught by Professor Ericringger during the Fall '11 term at BYU.

Page1 / 6

Lecture17-speechreco2 - 10/8/2011 This work is licensed...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online