Purdue University: ECE438  Digital Signal Processing with Applications
1
ECE438  Laboratory 9:
Speech Processing
(Week 2)
October 6, 2010
1
Introduction
This is the second part of a two week experiment. During the
first week
we discussed basic
properties of speech signals, and performed some simple analyses in the time and frequency
domain.
This week, we will introduce a system model for speech production. We will cover some
background on
linear predictive coding
, and the final exercise will bring all the prior material
together in a speech coding exercise.
1.1
A Speech Model
DT Impulse
Train
White
Noise
Voiced Sounds
Unvoiced Sounds
Vocal Tract
LTI, allpole filter
V(z)
x(n)
s(n)
speech
signal
T
p
G
Figure 1: DiscreteTime Speech Production Model
From a signal processing standpoint, it is very useful to think of speech production in
terms of a model, as in Figure 1. The model shown is the simplest of its kind, but it includes
all the principal components. The excitations for voiced and unvoiced speech are represented
Questions or comments concerning this laboratory should be directed to Prof. Charles A. Bouman,
School of Electrical and Computer Engineering, Purdue University, West Lafayette IN 47907; (765) 494
0340; [email protected]
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Purdue University: ECE438  Digital Signal Processing with Applications
2
by an impulse train and white noise generator, respectively. The pitch of voiced speech is
controlled by the spacing between impulses,
T
p
, and the amplitude (volume) of the excitation
is controlled by the gain factor
G
.
As the acoustical excitation travels from its source (vocal cords, or a constriction), the
shape of the vocal tract alters the spectral content of the signal. The most prominent effect
is the formation of resonances, which intensifies the signal energy at certain frequencies
(called
formants
). As we learned in the Digital Filter Design lab, the amplification of certain
frequencies may be achieved with a linear filter by an appropriate placement of poles in the
transfer function. This is why the filter in our speech model utilizes an allpole LTI filter.
A more accurate model might include a few zeros in the transfer function, but if the order
of the filter is chosen appropriately, the allpole model is sufficient.
The primary reason
for using the allpole model is the distinct computational advantage in calculating the filter
coefficients, as will be discussed shortly.
Recall that the transfer function of an allpole filter has the form
V
(
z
) =
1
1
−
∑
P
k
=1
a
k
z

k
(1)
where
P
is the order of the filter.
This is an IIR filter that may be implemented with a
recursive difference equation. With the input
G
·
x
(
n
), the speech signal
s
(
n
) may be written
as
s
(
n
) =
P
summationdisplay
k
=1
a
k
s
(
n
−
k
) +
G
·
x
(
n
)
(2)
Keep in mind that the filter coefficients will change continuously as the shape of the vocal
tract changes, but speech segments of an appropriately small length may be approximated
by a timeinvariant model.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 Staff
 Digital Signal Processing, Signal Processing, Purdue University, filter coefficients, Speech processing, LPC

Click to edit the document details