University of Toronto
Department of Electrical
November 8, 2000
& Computer Engineering
ECE1502S — Information Theory
Midterm Test Solutions
1. (
Matching Distributions
)
(a) Call a particular ordering of
Q
optimal
if
D
(
P

Q
) is minimized. Suppose an optimal
ordering exists in which
i < j
, but
q
i
> q
j
.
Let
Q
0
be the distribution obtained by
swapping the
i
th and
j
th probability masses. Then
D
(
P

Q
)

D
(
P

Q
0
)
=
p
i
log
p
i
q
i
+
p
j
log
p
j
q
j

p
i
log
p
i
q
j

p
j
log
p
j
q
i
=
p
i
log
q
j
+
p
j
log
q
i

p
i
log
q
i

p
j
log
q
j
=
(
p
i

p
j
)

{z
}
≤
0
(log
q
j

log
q
i
)

{z
}
<
0
≥
0
,
with equality if and only if
p
i
=
p
j
. We see that, in general, swapping the
i
th and
j
th
probability masses reduces the relative entropy, so
Q
can be optimal in this situation
only if
p
i
=
p
j
.
But if
p
i
=
p
j
then swapping
q
i
and
q
j
does not affect the relative
entropy. Thus sorting the probabilities yields an optimal ordering.
(b) Consider now
D
(
Q

P
) and assume
p
1
>
0.
Again suppose that an optimal ordering
exists in which
i < j
, but
q
i
> q
j
. Let
Q
0
be the distribution obtained by swapping the
i
th and
j
th probability masses. Then
D
(
Q

P
)

D
(
Q
0

P
)
=
q
i
log
q
i
p
i
+
q
j
log
q
j
p
j

q
j
log
q
j
p
i

q
i
log
q
i
p
j
=
q
i
log
p
j

q
i
log
p
i

q
j
log
p
j
+
q
j
log
p
i
=
(
q
i

q
j
)

{z
}
>
0
(log
p
j

log
p
i
)

{z
}
≥
0
=
≥
0
with equality if and only if
p
i
=
p
j
.
By the same argument as above, sorting the
probabilities yields an optimal ordering.
(c) If
Q
has one mass equal to zero, then by the result of (b), we can set
q
1
= 0. We wish
to select
q
2
, q
3
, . . . , q
m
so that
D
(
Q

P
) is minimized. Setting up the Lagrangian
L
(
q
2
, . . . , q
m
, λ
) =
m
X
i
=1
q
i
ln(
q
i
/p
i
) +
λ
(
m
X
i
=1
q
i

1)
,
differentiating with respect to
q
i
(
i >
1) and setting the result to zero, we find that
ln(
q
i
/p
i
) + 1 +
λ
= 0
,
i.e.,
q
i
/p
i
is a constant, independent of
i
. The constant is chosen to make
∑
m
i
=2
q
i
= 1.
We find that
q
i
=
(
0
if
i
= 0;
p
i
1

p
1
if 2
≤
i
≤
m
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
2. (
Huffman Coding with Costs
) The Huffman procedure minimizes
∑
p
i
l
i
, for a set of “weights”
{
p
i
}
that sum to unity. To show that this procedure works for any arbitrary set of weights,
simply divide by the sum of the weights.
(a) Specifically, for a given set of nonnegative weights
W
=
{
w
1
, w
2
, . . . , w
m
}
, let
Z
(
W
) =
m
X
i
=1
w
i
.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '11
 Kelly
 Information Theory, Probability theory, WI, pi pj pi

Click to edit the document details