Lectures 7-8_winter_2012_6tp

Lectures 7-8_winter_2012_6tp - General Synthesis Model...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
1 1 Digital Speech Processing— Lectures 7-8 Time Domain Methods in Speech Processing 2 General Synthesis Model 1 1 () Rz z α =− unvoiced sound amplitude T 1 T 2 Log Areas, Reflection Coefficients, Formants, Vocal Tract Polynomial, Articulatory Parameters, … Pitch Detection, Voiced/Unvoiced/Silence Detection, Gain Estimation, Vocal Tract Parameter Estimation, Glottal Pulse Shape, Radiation Model voiced sound amplitude General Analysis Model 3 Speech Analysis Model Pitch Period, T[n] Glottal Pulse Shape, g[n] Voiced Amplitude, A V [n] V/U/S[n] Switch Unvoiced Amplitude, A U [n] Vocal Tract IR, v[n] Radiation Characteristic, r[n] s[n] All analysis parameters are time-varying at rates commensurate with information in the parameters; We need algorithms for estimating the analysis parameters and their variations over time 4 Overview Signal Processing speech, x [ n ] representation of speech speech or music A(x,t) formants reflection coefficients voiced-unvoiced-silence pitch sounds of language speaker identification emotions • time domain processing => direct operations on the speech waveform • frequency domain processing => direct operations on a spectral representation of the signal system x [ n ] zero crossing rate level crossing rate energy autocorrelation • simple processing • enables various types of feature estimation 5 Basics • 8 kHz sampled speech (bandwidth < 4 kHz) properties of speech change with time • excitation goes from voiced to unvoiced • peak amplitude varies with the sound being produced • pitch varies within and across voiced sounds • periods of silence where background signals are seen • the key issue is whether we can create simple time-domain processing methods that enable us to measure/estimate speech representations reliably and accurately P 1 P 2 V U/S 6 Fundamental Assumptions • properties of the speech signal change relatively slowly with time (5-10 sounds per second) – over very short (5-20 msec) intervals => uncertainty due to small amount of data, varying pitch, varying amplitude – over medium length (20-100 msec) intervals => uncertainty due to changes in sound quality, transitions between sounds, rapid transients in speech – over long (100-500 msec) intervals => uncertainty due to large amount of sound changes • there is always uncertainty in short time measurements and estimates from speech signals
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 7 Compromise Solution • “short-time” processing methods => short segments of the speech signal are “isolated” and “processed” as if they were short segments from a “sustained” sound with fixed (non-time-varying) properties – this short-time processing is periodically repeated for the duration of the waveform – these short analysis segments, or “ analysis frames ” almost always overlap one another – the results of short-time processing can be a single number (e.g., an estimate of the pitch period within the frame), or a set of numbers (an estimate of the formant frequencies for the analysis frame)
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 13

Lectures 7-8_winter_2012_6tp - General Synthesis Model...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online