Adapt-handout

Adapt-handout - Massachusetts Institute of Technology...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
Massachusetts Institute of Technology 6.345/HST.728 Automatic Speech Recognition Spring, 2010 3/9/10 Lecture Handouts N A * search Speaker Adaptation
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
1 6.345/HST.728 Automatic Speech Recognition (2010) Search Use forward Viterbi search in Frst-pass to Fnd best path Search Example: Computing N-best Paths Lexical Nodes h# m z r a Time t 0 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 Relative and absolute thresholds used to speed-up search 33 6.345/HST.728 Automatic Speech Recognition (2010) Search h# m z r a Time t 0 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 Second pass uses backwards A* search to Fnd N -best paths Viterbi backtrace is used as future estimate for path scores N-best Computation with Backwards A* Search Block processing enables pipelined computation (optional) 34
Background image of page 2
1 6.345/HST.728 Automatic Speech Recognition (2010) Speaker Adaptation 1 Speaker Adaptation Normalization – Vocal Tract Length Normalization Adaptation – Hierarchical Speaker Clustering – Bayesian Adaptation – Eigenvoices – Transformational Adaptation 6.345/HST.728 Automatic Speech Recognition (2010) Speaker Adaptation 2 Variability and Correlation Plot of isometric likelihood contours for phones [i] and [e] One SI model and two speaker dependent (SD) models SD contours are tighter than SI and correlated w/ each other -7 -9 -8 -3 -1 -2 0 Principal Component 1 Principal Component 2 SI Model i e Speaker HXS0 i e Speaker DAS1 i e
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 6.345/HST.728 Automatic Speech Recognition (2010) Speaker Adaptation 3 Vocal Tract Length Normalization Vocal tract length affects formant frequencies: shorter vocal tracts higher formant frequencies longer vocal tracts lower formant frequencies Vocal tract length normalization (VTLN) tries to adjust input speech to have an “average” vocal tract length Method: Warp the frequency scale! warping factor Y '( ω ) Y ( ) = Y ( 1 γ ) < 1 > 1 6.345/HST.728 Automatic Speech Recognition (2010) Speaker Adaptation 4 Vocal Tract Length Normalization (cont) Illustration: second formant for [e] and [i] e i e i Warp spectrums of all training speakers to best Ft SI model SI models have large overlap (error region) Train VTLN-SI model SD models have smaller Warp test speakers to Ft VTLN-SI model
Background image of page 4
3 6.345/HST.728 Automatic Speech Recognition (2010) Speaker Adaptation 5 Vocal Tract Length Normalization During testing ML approach is used to fnd warp Factor: Warp Factor is typically Found using brute Force search Discrete set oF warp Factors tested over possible range ReFerences: γ = argmax p ( X | Θ VTLN ) 6.345/HST.728 Automatic Speech Recognition (2010) Speaker Adaptation 6 Adaptation Issues Speaker dependent models don ʼ t exist For new users System must learn characteristics oF new users
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 05/08/2010 for the course CS 6.345 taught by Professor Glass during the Spring '10 term at MIT.

Page1 / 18

Adapt-handout - Massachusetts Institute of Technology...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online