This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Speech Processing
Introduction Syllabus ECE 5525 Speech Processing Contact Info: Kpuska, Veton Olin Engineering Building, Office # 353 Tel. (321) 6747183 Email: [email protected] http://my.fit.edu/~vkepuska/ece5525 CRN 81888 Subj ECE Crse 5525 Sec E1 Credits 3.00 Title Speech Processing Instructor(s): Veton Kpuska Textbook(s): "DiscreteTime Speech Signal Processing: Principles and Practice", Thomas F.Quatieri, Prentice Hall, 2002 Reference Material: "Digital Processing of Speech Signals", L.R. Rabiner and R.W. Schafer, Prentice Hall, 1978 "Digital Signal Processing", Alan V. Oppenheim and Ronald W. Schafer, Prentice Hall, 1975 "Digital Signal Processing A Practical Approach", Emmanuel C. Ifeachor and Barrie W. Jervis, Second Edition, Prentice Hall, 2002 February 13, 2012 Veton Kpuska 2 Syllabus Course Goals: Teach modern methods that are used to process speech signals. Subject Area: Digital Speech Processing Prerequisites by Topic: Topics Covered: Digital Signal Processing DiscreteTime Speech Signal Processing, Production and Classification of Speech Sounds, Acoustic Theory of Speech Production, Speech Perception, Speech Analysis, Speech Synthesis, Homomorphic Speech Processing, ShortTime Fourier Transform Analysis and Synthesis, FilterBank Analysis/Synthesis, Sinusoidal Analysis/Synthesis, Frequency DomainPitch Estimation, Nonlinear Measurement and Modeling Techniques Coding of Speech Signals, Speech Enhancement, Speaker Recognition, Methods for Speech Recognition, Digital Speech Processing for ManMachine Communication by Voice Homework: Exams: Project(s): 20% 30% 50% Recommended Grading MATLAB Exercises and Homework Problems February 13, 2012 Veton Kpuska 3 Course Information http://my.fit.edu/~vkepuska/web http://my.fit.edu/~vkepuska/ece5525/ or Directly February 13, 2012 Veton Kpuska 4 Angel System February 13, 2012 Veton Kpuska 5 DiscreteTime Speech Signal Processing
Introduction February 13, 2012 Veton Kpuska 6 DiscreteTime Speech Signal Processing Speech has evolved as a primary form of communication between humans. Technological advancement has enhanced our ability to communicate: One early case is the transduction by a telephone handset of the continuouslyvarying speech pressure signal at the lips output to continuouslyvarying (analog) electric voltage signal. Digital technology has brought communication to a new level. This technology requires additional transduction of the signals: Analogtodigital (A/D) and DigitaltoAnalog converter (D/A) February 13, 2012 Veton Kpuska 7 DiscreteTime Speech Signal Processing The topic of this course can be loosely defined as: The manipulation of sampled speech signals by a digital processor to obtain a new signal with some desired properties. Example: Changing a speaker's rate of articulation with the use of digital computer. Modification of articulation rate (referred to as timescale modification of speech) has as an objective to: generate a new speech waveform that corresponds to a person talking faster or slower than the original rate, maintain the character of the speaker's voice (i.e., there should be little change in the pitch and spectrum of the original utterance). February 13, 2012 Veton Kpuska 8 TimeScale Modification Example Useful applications: Fast Scanning of a long recording in a message playback system. Slowing Down difficult to understand speech. February 13, 2012 Veton Kpuska 9 The SpeechCommunication Pathway The linguistic level of communication: Idea is first formed in the mind of the speaker. This idea is then transformed to words, phrases, and sentences according to the grammatical rules of the language. The physiological level of communication: The brain creates electric signals that move along the motor nerves these electric signals activate muscles in the vocal tract and vocal cords. The acoustic level in speech communication pathway: This vocal tract and vocal cord movement results in pressure changes within the vocal tract, and, in particular, at the lips initiating a sound wave that propagates in space. Pressure changes at the ear canal cause vibrations at the ear drum of the listener.
February 13, 2012 Veton Kpuska 10 The SpeechCommunication Pathway The physiological level of communication pathway: The linguistic level of the listener: Eardrum vibrations induce electric signals that move along the sensory nerves to the brain. The brain performs speech recognition and understanding. The linguistic and physiological activity of: The speaker The listener => => "transmitter" "receiver" February 13, 2012 Veton Kpuska 11 The SpeechCommunication Pathway Transmitter and Receiver of the speechcommunication system have other functions besides basic communication: Monitoring and correction of one's own speech via the feedback through speakers ear (importance of the feedback studied in the speech of the deaf). Receiver performs voice recognition: Control of articulation rate Adaptation of speech production to mimic voices, etc. Robust to noise and other interferences Able to focus on a single lowvolume speaker in a room full with louder interfering multiple speakers (cocktail party effect). February 13, 2012 Veton Kpuska 12 The SpeechCommunication Pathway Significant advances in reproducing parts of this communication system by synthetic means. Far from emulating the human communication system. February 13, 2012 Veton Kpuska 13 Analysis/Synthesis Based on Speech Production and Perception This class does not cover entire speech communication pathway: Signal measurements of the acoustic waveform. From these measurements and current understanding of speech of how the vocal tract and vocal cords produce sound waves production models are build. Starting from analog representations which are then transformed to discretetime representations (A/D). From the receiver side the signal processing of the ear and higher auditory levels are covered however to significantly lesser extend only to account for the effect of speech processing on perception. February 13, 2012 Veton Kpuska 14 Analysis/Synthesis Based on Speech Production and Perception Preview of a speech model. Figure in the next slide shows a model of vowel production. In a vowel production: Air is forced from the lungs (by contraction of the muscles around the lung cavity). Air then flows pass the vocal cord/folds (two masses of flesh) causing periodic vibration of the cords. The rate of vibration of the cords determines the pitch of the sound. Periodic puffs caused by vibration of cords act as an excitation input, or source, to the vocal tract. The vocal tract is the cavity between the vocal cords and the lips: Vocal tract acts as a resonator that spectrally shapes the periodic input (much like the cavity of the musical wind instrument). February 13, 2012 Veton Kpuska 15 Analysis/Synthesis Based on Speech Production and Perception February 13, 2012 Veton Kpuska 16 Analysis/Synthesis Based on Speech Production and Perception From this basic understanding of the speech production mechanism a simple engineering model can be build, referred to source/filter model. In this model the following is assumed: Vocal tract is a liner timeinvariant system (or filter), This linear timeinvariant system is driven by a periodic impulselike input. Those assumptions imply that the output at the lips that is itself periodic. February 13, 2012 Veton Kpuska 17 Analysis/Synthesis Based on Speech Production and Perception Example of a vowel is "a" as in the word "father". Vowel "a" is one of many basic sounds of a language called phonemes. For each phoneme a different production model is built. Typical speech utterance consists of a string of vowel and consonant phonemes whose temporal and spectral characteristics change with time. This change corresponds to the changes in This fact implies: excitation source and vocal tract system. Therefore, even though a simple linear timeinvariant model seems plausible, it does not always represent well the real system. A timevarying source and system, Furthermore, in realty there is a complex nonlinear interaction of both. February 13, 2012 Veton Kpuska 18 Analysis/Synthesis Based on Speech Production and Perception Using discretetime modeling of speech production the course will cover the design of speech analysis/synthesis systems as depicted in Figure 1.3. Analysis part extracts underlying parameters of timevarying model from speech waveform. Synthesis takes extracted parameters and models to put back together the speech waveform. An objective in this development is to achieve an identity system for which the output equals to input when no manipulation is performed. A number of other analysis/synthesis methods can be derived based on various useful mathematical representations in time or frequency. These analysis/synthesis methods are the backbone for applications that transform the speech waveform into some desirable form. February 13, 2012 Veton Kpuska 19 Applications (Project Areas) Speech Modification: timescale manipulations: Fitting the speech waveform Speeding up speech In Radio and TV commercials into an allocated time slot and the synchronization of audio and video presentation. Message playback Voice mail Reading machines and books for the blind Learning a foreign language Slowing down speech Voice transformations using Pitch and spectral changes of speech signal: Voice disguise Entertainment Speech synthesis Spectral change of frequency compression and expansion: may be useful in transforming speech as an aid to the partially deaf. Many methods can be applied to music and special effects.
Veton Kpuska 20 February 13, 2012 Applications (Project Areas) Speech Coding Goal is to reduce the information rate measured in bits per second while maintaining the quality of the original waveform. Waveform coders: February 13, 2012 Vocoders: Represent the speech waveform directly and do not rely on a speech production model. Operate in a high range of 1664 kbps Largely are speech modelbased and rely on a small set of model parameters. Operate at the low bit range of 1.24.8 kbps Lower quality then waveform coders. Partly waveform based and partly speech modelbased Operate in the 4.8 16 kbps range
Veton Kpuska 21 Hybrid coders: Applications (Project Areas) Applications of speech coders include: Digital telephony over constrained bandwidth channels Cellular Satellite Voice over IP (Internet) Video phones Storage of Voice messages for computer voice mail applications. February 13, 2012 Veton Kpuska 22 Applications (Project Areas) Speech Enhancement Goal is to improve the quality of degraded speech. Preprocess speech before is degraded: Increasing the broadcast range of transmitters constrained by a peak power transmission limits (e.g., AM radio and TV transmissions). Reduction of additive noise in (Digital) telephony Vehicle and aircraft communications Reduction of interfering backgrounds and speakers for the hearing impaired, Removal of unwanted convolutional channel distortion and reverberation Restoration of old phonograph recordings degraded by: Acoustic horns Impulselike scratches from age and wear Enhancing the speech waveform after it is degraded. February 13, 2012 Veton Kpuska 23 Applications (Project Areas) Speaker Recognition Speech signal processing exploits the variability of speech model parameters across speakers. Verifying a person's identity (Biometrics) Voice identification in forensic investigation. Understanding of the speech model features that cue a person's identity is also important in speech modification where model parameters can be transformed for the study of specific voice characteristics: Speech modification and speaker recognition can be developed synergistically. Speech (Voice) Recognition is covered in ECE 5526
February 13, 2012 Veton Kpuska 24 End ...
View Full Document
This note was uploaded on 02/11/2012 for the course ECE 5525 taught by Professor Staff during the Fall '10 term at FIT.
- Fall '10