Lecture 3_winter_2012_6tp

Lecture 3_winter_2012_6tp - Topics to be Covered Digital...

Info icon This preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
1 1 Digital Speech Processing— Lecture 3 Acoustic Theory of Speech Production 2 Topics to be Covered Sound production mechanisms of the human vocal tract Sounds of language => phonemes Conversion of text to sounds via letter-to-sound rules and dictionary lookup Location/properties of sounds in the acoustic waveform Location/properties of sounds in spectrograms Articulatory properties of speech sounds—place and manner of articulation 3 Topics to be Covered sounds of speech – acoustic phonetics – place and manner of articulation sound propagation in the human vocal tract transmission line analogies time-varying linear system approaches source models 4 Basic Speech Processes idea Æ sentences Æ words Æ sounds Æ waveform Æ waveform Æ sounds Æ words Æ sentences Æ idea Idea : it’s getting late, I should go to lunch, I should call Al and see if he wants to join me for lunch today Words : Hi Al, did you eat yet? Sounds : /h/ /a y /-/ae/ /l/-/d/ /ih/ /d/-/y/ /u/-/iy/ /t/-/y/ / ε / /t/ Coarticulated Sounds : /h- a y -l/-/d-ih-j-uh/-/iy-t-j- ε -t/ (hial-dija- eajet) remarkably, humans can decode these sounds and determine the meaning that was intended—at least at the idea/concept level (perhaps not completely at the word or sound level); often machines can also do the same task speech coding: waveform Æ (model) Æ waveform speech synthesis: words Æ waveform speech recognition: waveform Æ words/sentences speech understanding: waveform Æ idea Basics speech is composed of a sequence of sounds sounds (and transitions between them) serve as a symbolic representation of information to be shared between humans (or humans and machines) arrangement of sounds is governed by rules of language (constraints on sound sequences, word sequences, etc)--/spl/ exists, /sbk/ doesn’t exist linguistics is the study of the rules of language phonetics is the study of the sounds of speech can exploit knowledge about the structure of sounds and language—and how it is encoded in the signal—to do speech analysis, speech coding, speech synthesis, speech recognition, speaker recognition, etc. 6 Human Vocal Apparatus Mid-sagittal plane X-ray of human vocal apparatus vocal tract —dotted lines in figure; begins at the glottis (the vocal cords) and ends at the lips • consists of the pharynx (the connection from the esophagus to the mouth) and the mouth itself (the oral cavity) • average male vocal tract length is 17.5 cm • cross sectional area, determined by positions of the tongue, lips, jaw and velum, varies from zero (complete closure) to 20 sq cm nasal tract —begins at the velum and ends at the nostrils velum —a trapdoor-like mechanism at the back of the mouth cavity; lowers to couple the nasal tract to the vocal tract to produce the nasal sounds like /m/ (mom), /n/ (night), /ng/ (sing) Vocal Tract MRI Sequences Mid-sagittal plane X-ray of human vocal apparatus
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon