Segment-Based SR

Segment-Based SR - Speech Recognition Segment-Based Speech...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Speech Recognition Segment-Based Speech Recognition February 13, 2012 Veton Kpuska 2 Outline Introduction to Segment-Based Speech Recognition Searching graph-based observation spaces Anti-phone modeling Near-miss modeling Modeling landmarks Phonological modeling February 13, 2012 Veton Kpuska 3 Segment-Based Speech Recognition February 13, 2012 Veton Kpuska 4 Segment-Based Speech Recognition Acoustic modeling is performed over an entire segment Segments typically correspond to phonetic- like units Potential advantages: Improved joint modeling of time/spectral structure Segment- or landmark-based acoustic measurements Potential disadvantages: Significant increase in model and search computation Difficulty in robustly training model parameters February 13, 2012 Veton Kpuska 5 Hierarchical Acoustic-Phonetic Modeling Homogeneous measurements can compromise performance Nasal consonants are classified better with a longer analysis window Stop consonants are classified better with a shorter analysis window Class-specific information extraction can reduce error. February 13, 2012 Veton Kpuska 6 Committee-based Phonetic Classification Change of temporal basis affects within-class error Smoothly varying cosine basis better for vowels and nasals Piecewise-constant basis better for fricatives and stops Combining information sources can reduce error. February 13, 2012 Veton Kpuska 7 Phonetic Classification Experiments (A. Halberstadt, 1998) TIMIT acoustic-phonetic corpus Context-independent classification only 462 speaker training corpus, 24 speaker core test set Standard evaluation methodology, 39 common phonetic classes Several different acoustic representations incorporated Various time-frequency resolutions (Hamming window 10-30 ms) Different spectral representations (MFCCs, PLPCCs, etc) Cosine transform vs. piecewise constant basis functions Evaluated MAP hierarchy and committee-based methods February 13, 2012 Veton Kpuska 8 Phonetic Classification Experiments (A. Halberstadt, 1998) Evaluation Results Method %Error Baseline 21.6 MAP Hierarchy 21.0 Committee of Classifiers 18.5 Committee with Hierarchy 18.3 February 13, 2012 Veton Kpuska 9 Statistical Approach to ASR Given acoustic observations, A, choose word sequence, W*, which maximizes a posteriori probability, P(W|A) Language Model Acoustic Model Signal Processor Linguistic Decoder Words W Speech A W P(A|W) P(W) ( 29 A W P W W | max arg = February 13, 2012 Veton Kpuska 10 Statistical Approach to ASR Bayes rule is typically used to decompose P(W|A) into acoustic and linguistic terms ( 29 ( 29 ( 29 ( 29 A P W P W A P A W P | | = February 13, 2012...
View Full Document

Page1 / 40

Segment-Based SR - Speech Recognition Segment-Based Speech...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online