2.
Code
this.
[20
P
oints]
Arithmetic coding
is a
standard compression method. In the case when the string to
be compressed is a sequence of biased coin flips, it can be described as follows. Suppose
that we have a sequence of bits
X
=
(
X
1
,
X
2
, . . . ,
X
n
), where each
X
i
is
independen
tly
0 with probability
p
and 1 with probability 1
−
p
.
The sequences
can be
ordered
lexicographically, so for
x
= (
x
1
, x
2
, . . . ,
x
n
) and
y
= (
y
1
, y
2
, . . . , y
n
), we say that
x <
y
if
x
i
= 0 and
y
i
= 1 in the first coordinate
i
such that
x
i
=
y
i
. If
z
(
x
) is the number of
zeroes in
the string
x
, then define
p
(
x
)
=
p
z
(
x
)
(1
−
p
)
n−z
(
x
)
and
q
(
x
)
=
∑
p
(
y
)
.
y<x
(A)
Suppose we are given
X
=
(
X
1
,
X
2
, . . . ,
X
n
). Explain how to compute
q
(
X
) in
time
O
(
n
) (assume that any reasonable
operation on real numbers takes
constan
t
time).
(B) Argue that the intervals
[
q
(
x
)
, q
(
x
)
+
p
(
x
)
)
are
disjoin
t subintervals of [0
,
1).
(C) Given (A) and (B), the sequence
X
can be represented by any
p
oin
t in the
in
terv
al
I
(
X
)
=
[
q
(
X
)
, q
(
X
)
+
p
(
X
)
)
. Show that we can choose a codeword in
I
(
X
)
with
⌈
lg
(1
/p
(
X
))
⌉
+ 1 binary digits to
represen
t
X
in such a way that no codeword is
the prefix of
an
y other codew
ord.
(D) Given a codeword chosen as in (C), explain how to decompress it to determine the
corresponding sequence
(
X
1
,
X
2
, . . . ,
X
n
).
(E) (Extra credit.)
Using the
Chernoff inequality, argue that lg
(1
/p
(
X
)) is close to
n
H (
p
) with
high probability. Thus, this approach yields an effective compression
scheme.
3.
Extra
ction
to the limit,
[20 Points]
We have shown that we can extract, on average, at least
⌊
lg
m
⌋
−
1
indep
en-
dent, unbiased bits from a number chosen uniformly at random from
{
0
, . . . , m
−
1
}
.
It follows that if we have
k
numbers chosen independently and uniformly at
random
from
{
0
, . . . , m
−
1
}
then we can extract, on average, at least
k
⌊
lg
m
⌋
−
k
independent,
unbiased
bits from them.
Give a better procedure that
extracts, on average, at least
k
⌊
lg
m
⌋
−
1
independen
t, unbiased bits from these
n
um
b
ers.
2