ATSP-notes

ATSP-notes - Acoustic Theory of Speech Production Victor...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
Acoustic Theory of Speech Production Victor Zue Supplementary Notes for 6.345 Automatic Speech Recognition Department of Electrical Engineering & Computer Science Massachusetts Institute of Technology Spring, 2010 Preliminary Draft (Do not duplicate without explicit, written permission.)
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
1 Introduction Speech is generated by closely coordinated movements of several groups of human anatomical structures shown in Figure 1. One such group of structures consists of those that enclose the air passage below the larynx. Through control of the muscles and through forces generated by the elastic recoil of the lungs, pressure can be built up below the larynx. The pressure eventually provides energy for the speech signal. Soft Palate (Velum) Hyoid Bone Epiglottis Cricoid Cartilage Esophagus Nasal Cavity Hard Palate Tongue Thyroid Cartilage Vocal Cords Trachea Lung Sternum Nasal Cavity Hard Palate Tongue Thyroid Cartilage Vocal Folds Trachea Lung Soft Palate (Velum) Jaw Figure 1: Anatomical structures for speech production. Immediately above the trachea is the larynx, which constitutes the second group of structures essential to the production of speech. The vocal folds in the larynx can be positioned in many ways so that air can ±ow through the glottis either with or without setting the vocal folds into vibration. When the vocal folds are set into vibration, the air±ow through the glottis is interrupted quasi-periodically, thus creating the e²ect of modulation. 1
Background image of page 2
The third set of structures consists of the tongue, jaw, lips, velum, and other components that form the vocal and nasal cavities. By changing the conFguration of the vocal tract, one can shape the detailed characteristics of speech sounds being produced. The speech signal as recorded by a microphone is a signal that re±ects the sound pressure variation as a function of time. As illustrated in ²igure 2, which depicts the speech signal as a function of time, the waveform varies signiFcantly across time. (This is the sentence, “Two plus seven is less than ten,” spoken by a male speaker.) Some portions of the signal appear to be (quasi-)periodic, other parts look like random noise, and still other parts are transient in nature. ²igure 2: Example of a speech waveform. 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
From the speech waveform, it is often di±cult to describe the essential character- istics of various speech sounds. One display that speech researchers have found useful is the “speech spectrogram.” A speech spectrogram is a three dimensional display of the speech signal with the x-axis representing time, the y-axis representing frequency, and the z-axis (i.e., darkness of the marking) denoting intensity. The spectrogram for the above waveform is shown in Figure 3. This is a digitally generated spectrogram.
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 05/08/2010 for the course CS 6.345 taught by Professor Glass during the Spring '10 term at MIT.

Page1 / 30

ATSP-notes - Acoustic Theory of Speech Production Victor...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online