ECE 1502 — Information Theory
Problem Set 4 solutions
1
March 12, 2006
8.1
Preprocessing the output.
(a) The statistician calculates
˜
Y
=
g
(
Y
). Since
X
→
Y
→
˜
Y
forms a Markov chain, we can
apply the data processing inequality. Hence for every distribution on
x
,
I
(
X
;
Y
)
≥
I
(
X
;
˜
Y
)
.
(1)
Let ˜
p
(
x
) be the distribution on
x
that maximizes
I
(
X
;
˜
Y
). Then
C
= max
p
(
x
)
I
(
X
;
Y
)
≥
I
(
X
;
Y
)
p
(
x
)=˜
p
(
x
)
≥
I
(
X
;
˜
Y
)
p
(
x
)=˜
p
(
x
)
= max
p
(
x
)
I
(
X
;
˜
Y
) =
˜
C.
(2)
Thus, the statistician is wrong and processing the output does not increase capacity.
(b) We have equality (no decrease in capacity) in the above sequence of inequalities only if
we have equality in data processing inequality, i.e., for the distribution that maximizes
I
(
X
;
˜
Y
), we have
X
→
˜
Y
→
Y
forming a Markov chain.
8.2
Maximum likelihood decoding.
(a) By Bayes rule, for any events
A
and
B
,
Pr(
A

B
) =
Pr(
A
) Pr(
B

A
)
Pr(
B
)
.
(3)
In this case, we wish to calculate the conditional probability of
a
1
given the channel
output. Thus we take the event
A
to the event that the source produced
a
1
, and
B
to be
the event corresponding to one of the 8 possible output sequences. Thus Pr(
A
) = 1
/
2,
and Pr(
B

A
) =
±
i
(1

±
)
3

i
, where
±
is the number of ones in the received sequence.
Pr(
B
) can then be calculated as Pr(
B
) = Pr(
a
1
) Pr(
B

a
1
) + Pr(
B

a
2
). Thus we can
calculate
Pr(
a
1

000) =
1
2
(1

±
)
3
1
2
(1

±
)
3
+
1
2
±
3
(4)
Pr(
a
1

100) = Pr(
a
1

010) = Pr(
a
1

001) =
1
2
(1

±
)
2
±
1
2
(1

±
)
2
±
+
1
2
±
2
(1

±
)
(5)
Pr(
a
1

110) = Pr(
a
1

011) = Pr(
a
1

101) =
1
2
(1

±
)
±
2
1
2
(1

±
)
±
2
+
1
2
±
(1

±
)
2
(6)
Pr(
a
1

111) =
1
2
±
3
1
2
±
3
+
1
2
(1

±
)
3
(7)
(b) If
± <
1
/
2, then the probability of
a
1
given 000,001,010 or 100 is greater than 1/2,
and the probability of
a
2
given 110,011,101 or 111 is greater than 1/2. Therefore, the
decoding rule above chooses the source symbol that has maximum probability given the
observed output. This is the
maximum a posteriori
decoding rule, and is optimal in that
1
Solutions to problems from the text are supplied courtesy of Joy A. Thomas.
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Documentit minimizes the probability of error. To see that this is true, let the input source symbol
be
X
, let the output of the channel be denoted by
Y
and the decoded symbol be
ˆ
X
(
Y
).
Then
Pr(
E
) = Pr(
X
6
=
ˆ
X
)
(8)
=
X
y
Pr(
Y
=
y
) Pr(
X
6
=
ˆ
X

Y
=
y
)
(9)
=
X
y
Pr(
Y
=
y
)
X
x
6
=ˆ
x
(
y
)
Pr(
x

Y
=
y
)
(10)
=
X
y
Pr(
Y
=
y
) (1

Pr(ˆ
x
(
y
)

Y
=
y
))
(11)
=
X
y
Pr(
Y
=
y
)

X
y
Pr
(
Y
=
y
) Pr(ˆ
x
(
y
)

Y
=
y
)
(12)
= 1

X
y
Pr
(
Y
=
y
) Pr(ˆ
x
(
y
)

Y
=
y
)
(13)
and thus to minimize the probability of error, we have to maximize the second term,
which is maximized by choosing ˆ
x
(
y
) to the the symbol that maximizes the conditional
probability of the source symbol given the output.
(c) The probability of error can also be expanded
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 323
 Information Theory, Probability theory, zth subchannel

Click to edit the document details