EE 376A
Handout #21
Information Theory
Tuesday, March 1, 2011
Prof. T. Cover
Solutions to Homework Set #6
1.
Postprocessing the output.
One is given a communication channel with transition probabilities
p
(
y

x
) and channel
capacity
C
= max
p
(
x
)
I
(
X
;
Y
)
.
A helpful statistician postprocesses the output by forming
˜
Y
=
g
(
Y
)
,
yielding a channel
p
(˜
y

x
).
He claims that this will strictly improve the
capacity.
(a) Show that he is wrong.
(b) Under what conditions does he not strictly decrease the capacity?
Solution: Preprocessing the output.
(a) The statistician calculates
˜
Y
=
g
(
Y
). Since
X
→
Y
→
˜
Y
forms a Markov chain,
we can apply the data processing inequality. Hence for every distribution on
x
,
I
(
X
;
Y
)
≥
I
(
X
;
˜
Y
)
.
Let ˜
p
(
x
) be the distribution on
x
that maximizes
I
(
X
;
˜
Y
). Then
C
= max
p
(
x
)
I
(
X
;
Y
)
≥
I
(
X
;
Y
)
p
(
x
)=˜
p
(
x
)
≥
I
(
X
;
˜
Y
)
p
(
x
)=˜
p
(
x
)
= max
p
(
x
)
I
(
X
;
˜
Y
) =
˜
C.
Thus, the helpful suggestion is wrong and processing the output does not increase
capacity.
(b) We have equality (no decrease in capacity) in the above sequence of inequalities
only if we have equality in data processing inequality, i.e., for the distribution
that maximizes
I
(
X
;
˜
Y
), we have
X
→
˜
Y
→
Y
forming a Markov chain. Thus,
˜
Y
should be a sufficient statistic.
2.
Noisy typewriter.
Consider a 26key typewriter.
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
(a) If pushing a key results in printing the associated letter, what is the capacity
C
in bits?
(b) Now suppose that pushing a key results in printing that letter or the next (with
equal probability). Thus
A
→
A
or
B
, and
Z
→
Z
or
A.
What is the capacity?
(c) What is the highest rate code with block length one that you can find that achieves
zero
probability of error for the channel in part (b) .
Solution: Noisy typewriter.
(a) If the typewriter prints out whatever key is struck, then the output
Y
is the same
as the input
X
and
C
= max
I
(
X
;
Y
) = max
H
(
X
) = log 26
,
(1)
attained by a uniform distribution over the letters.
(b) In this case, the output is either equal to the input (with probability
1
2
) or equal
to the next letter ( with probability
1
2
). Hence
H
(
Y

X
) = log 2 independent of
the distribution of
X
, and hence
C
= max
I
(
X
;
Y
) = max
H
(
Y
)
−
log 2 = log 26
−
log 2 = log 13
,
(2)
which is attained for a uniform distribution over the output, which in turn is
attained by a uniform distribution on the input.
(c) A simple zero error block length one code is the one that uses every alternate
letter, say A,C,E,. . . ,W,Y. In this case, none of the codewords will be confused,
since A will produce either A or B, C will produce C or D, etc. The rate of this
code,
R
=
log(# codewords)
Block length
=
log 13
1
= log 13
.
(3)
In this case, we can achieve capacity with a simple code with zero error. Note that
the uniform distribution over the output is attained also by this input distribution.
This is the end of the preview.
Sign up
to
access the rest of the document.
 '10
 Prof.T.Weissman
 Information Theory, Probability theory, y1, NC, zi

Click to edit the document details