Purdue University: ECE438 - Digital Signal Processing with Applications
1
ECE438 - Laboratory 9:
Speech Processing
(Week 2)
October 6, 2010
1
Introduction
This is the second part of a two week experiment. During the
first week
we discussed basic
properties of speech signals, and performed some simple analyses in the time and frequency
domain.
This week, we will introduce a system model for speech production. We will cover some
background on
linear predictive coding
, and the final exercise will bring all the prior material
together in a speech coding exercise.
1.1
A Speech Model
DT Impulse
Train
White
Noise
Voiced Sounds
Unvoiced Sounds
Vocal Tract
LTI, all-pole filter
V(z)
x(n)
s(n)
speech
signal
T
p
G
Figure 1: Discrete-Time Speech Production Model
From a signal processing standpoint, it is very useful to think of speech production in
terms of a model, as in Figure 1. The model shown is the simplest of its kind, but it includes
all the principal components. The excitations for voiced and unvoiced speech are represented
Questions or comments concerning this laboratory should be directed to Prof. Charles A. Bouman,
School of Electrical and Computer Engineering, Purdue University, West Lafayette IN 47907; (765) 494-
0340; [email protected]
This
preview
has intentionally blurred sections.
Sign up to view the full version.
Purdue University: ECE438 - Digital Signal Processing with Applications
2
by an impulse train and white noise generator, respectively. The pitch of voiced speech is
controlled by the spacing between impulses,
T
p
, and the amplitude (volume) of the excitation
is controlled by the gain factor
G
.
As the acoustical excitation travels from its source (vocal cords, or a constriction), the
shape of the vocal tract alters the spectral content of the signal. The most prominent effect
is the formation of resonances, which intensifies the signal energy at certain frequencies
(called
formants
). As we learned in the Digital Filter Design lab, the amplification of certain
frequencies may be achieved with a linear filter by an appropriate placement of poles in the
transfer function. This is why the filter in our speech model utilizes an all-pole LTI filter.
A more accurate model might include a few zeros in the transfer function, but if the order
of the filter is chosen appropriately, the all-pole model is sufficient.
The primary reason
for using the all-pole model is the distinct computational advantage in calculating the filter
coefficients, as will be discussed shortly.
Recall that the transfer function of an all-pole filter has the form
V
(
z
) =
1
1
−
∑
P
k
=1
a
k
z
-
k
(1)
where
P
is the order of the filter.
This is an IIR filter that may be implemented with a
recursive difference equation. With the input
G
·
x
(
n
), the speech signal
s
(
n
) may be written
as
s
(
n
) =
P
summationdisplay
k
=1
a
k
s
(
n
−
k
) +
G
·
x
(
n
)
(2)
Keep in mind that the filter coefficients will change continuously as the shape of the vocal
tract changes, but speech segments of an appropriately small length may be approximated
by a time-invariant model.

This is the end of the preview.
Sign up
to
access the rest of the document.
- Spring '08
- Staff
- Digital Signal Processing, Signal Processing, Purdue University, filter coefficients, Speech processing, LPC
-
Click to edit the document details