10segment

10segment - Segment-Based Speech Recognition Introduction...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
1 6.345/HST.728 Automatic Speech Recognition (2010) Segment-based ASR 1 Segment-Based Speech Recognition Introduction Phonetic classifcation Probabilistic Formulation For graph-based observation spaces – Anti-phone modelling – Near-miss modelling – Modelling landmarks Phonetic and word recognition Search and training issues 6.345/HST.728 Automatic Speech Recognition (2010) Segment-based ASR 2 Speech Science meets Speech Technology Linguistic Constraints Recognized Words Search Representation Speech Signal Lexical Models Acoustic Models Language Models
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 6.345/HST.728 Automatic Speech Recognition (2010) Segment-based ASR 3 Speech Knowledge Explicit Heuristic, rule-based models Heterogeneous and complex knowledge representation Intense knowledge engineering Formal mathematical models Homogeneous and simple knowledge representations Automatic learning from data Implicit De±ning the Context Automatic speech recognizers differ in the degree with which speech knowledge is incorporated into the system Is there a middle ground between these approaches? 6.345/HST.728 Automatic Speech Recognition (2010) Segment-based ASR 4 An Abundance of Acoustic Cues
Background image of page 2
3 6.345/HST.728 Automatic Speech Recognition (2010) Segment-based ASR 5 Spectrogram Reading Experiments N V C V V? N V “lionhearted” “marauding” “midmorning” “immortal” “memorials” 6.345/HST.728 Automatic Speech Recognition (2010) Segment-based ASR 6 Modelling Segments
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 6.345/HST.728 Automatic Speech Recognition (2010) Segment-based ASR 7 Modelling Landmarks 6.345/HST.728 Automatic Speech Recognition (2010) Segment-based ASR 8 Hierarchical Classifers ɑ æ m n ŋ C 0 ={ vowels } C 1 ={ nasals } C 2 ={ } { All phones } P ( α | f 1 ... f n ) = P ( | C i i f 2 ) P ( C i | f 1 ) P ( | C i , f i 2 ) P ( C i | f 1 )
Background image of page 4
5 6.345/HST.728 Automatic Speech Recognition (2010) Segment-based ASR 9 Committee-based Classifers Combining inFormation sources can reduce error (e.g., ROVER) Classifers can also be multi-trained versions oF same model Changing Features & classifers produces diFFerent results Aggregation combines N classifers to Form one classifer Φ ( x ) = 1 N Φ i i ( x ) 6.345/HST.728 Automatic Speech Recognition (2010) Segment-based ASR 10 Phonetic Classifcation Experiments TIMIT acoustic-phonetic corpus – 39 phonetic units; 630 native speakers; 10 utterances per speaker Context- independent classifcation error oF 18.3% Context- dependent classifcation error oF 15.0% Best known results reported For these tasks ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6 6.345/HST.728 Automatic Speech Recognition (2010) Segment-based ASR 11 A Closer Look 6.345/HST.728 Automatic Speech Recognition (2010) Segment-based ASR 12 Segment-based Speech Recognition Acoustic modelling is performed over an entire segment Segments typically correspond to phonetic-like units Potential advantages: – Improved joint modelling of time/spectral structure – Segment- or landmark-based acoustic measurements Potential disadvantages: – SigniFcant increase in model and search computation – DifFculty in robustly training model parameters
Background image of page 6
7 6.345/HST.728
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 05/08/2010 for the course CS 6.345 taught by Professor Glass during the Spring '10 term at MIT.

Page1 / 17

10segment - Segment-Based Speech Recognition Introduction...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online