This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Chapter 3 The Asymptotic Equipartition
Property 3 . 1. Markov’s inequality and Chebyshev’s inequality. (a) (Markov’s inequality.) For any non—negative random variable X and any 15 > O,
Show that '
EX Pr{X 2 t} g T . (31)
Exhibit a random variable that achieves this inequality with equality. (b) (Chebyshev’s inequality.) Let Y be a random variable with mean ,u and variance
02. By letting X = (Y — W2: show that for any 5 > 0, 2 c7
Pr{iY A ,u] > e} S 22—. (3.2) (c) (The weak law of large numbers.) Let Z1,Z2, . . . , Zn be a sequence of i.i.d. random
variables with mean ,u and variance 02 . Let 7” : 372:; Zi be the sample mean. Show that 2
— or
Pr {[2.1 — p.) > a} g E' (3.3)
Thus Pr {liq — ,u‘ > e} —> 0 as n —> 00. This is known as the weak law of large
numbers. Solution: Markov’s inequality and Chebyshev’s inequality.
(a) If X has distribution F(a:), EX = ijdF
D 5 00
/ mdF + f mdF
0 6 49 50 (b) The Asymptotic Equipartition Property (3.4) One student gave a proof based on conditional expectations. It goes like EX IV IV which leads to (3.4) as well. Given 5 , the distribution achieving where n g 5 . 7
Letting X = (Y e W2 Pr{ PI{X 2 5} : Egg—C, X 3 5 With probability ’3”
O with probability 1 —— 1'5, ' in Markov’s inequality, (Y  m2 > 62} é Pr{(Y  m2 2 62}
< E<Y — m2
62
= 7, and noticing that Pr{(Y — .002 > 62} = Pr{Y — JLL > e}, we get, Letting Y in Chebysh
E2” = a and Var(Z_n . 2
variance $— ), we have1 0.2 Pr{Y 4 ,LLI > e} 3 6—2. E(X1X g 6) Pr{X 2 5} + E(XX < 5) Pr[X < 5}
me g 5) PIS[X 2 5}
6Pr{X 3 a}, ev’s inequality from part {13) equal Zn, and noticing that ) = 0—: {ie. Zn is the sum of n iid r.v.’s, _ 0'2
P r < —.
I'llZn nu'l > 5} — n62 5:; n 1 each with 50 _ t The Asymptotic Equipartition Property fodF
5 DO
Z/édF
r5 : 6Pr{X Z 6}.
Rearranging sides and dividing by 6 we get, Pr{X 2 5} 3 E533 (34) One student: gave a proof based on conditional expectations. It goes like E(XX g 6)Pr{X 2 a} + E(X1X < 5') Pr{X < a}
3 E(XlX g 6)PI{X 2 a}
3 5Pr{X 2 5}, ‘ EX which leads to (3.4) as well.
Given 5 , the distribution achieving Pr{X 2 5} : E721; is
X I 5 with probability g
0 with probability 1 H ‘5‘, where ,u g 6 .
(b) Letting X = (Y H ,LL)2 in Markov’s inequality, /\
”U
H
:2
“<1
I
7;
V
Ix:
IV
on
M
Wu Pr{(Y — m2 > 62} S and noticing that Pr{(Y —— ,LL)2 > 62} = PrﬂY — M > e}, we get, Pr{lY — a1> e} g 5:3. (c) Letting Y in Chebyshev’s inequality from part (b) equal Zn, and noticing that
2 Bin 2 ,u and Var{Z_n) = ”7; (ie. Zn is the sum of an iid r.v.’s, %, each with _ 2
variance ﬁr ), we have,
2 _ (r
Pr{Zn __ ,LLI > e} 3 Kg. The Asymptotic Equipartition Property 51 3, 2. AEP and mutual information. Let (Xth) be i.i..d. ~ p(.r,y). We form the log
likelihood ratio of the hypothesis that X and Y are independent vs. the hypothesrs
that X and Y are dependent. What is the limit of 310 p(X“)P(Y”)?
n g p(X'n)Yn) Waum___,W,W_.;..__.__._____L=.i Solution: _1_1 _.__
n 0g p(X”,Y”) plX”)p(Y”) _ 1 O ” PtXimO/i} 1 “ PCXiipiyz‘) 41:1. —> Men —I(X;Y) Thus p 0&1)?le? —> 2—"I(X;Y), which will converge to 1 if X and Y are indeed
I p 71: n independent. 3‘ 6. An AEPlike limit. Let Xi,X2,... be i.i.d. drawn according to probability mass
function Me). Find lirn [p(X1,X2,...,Xﬂ)}%. TIE—+00 Solution: An AEPliite limit. X1,X2,..., i.i.d. ~p(:c). Hence log(Xi) are also Lid.
and ' J.
lim(P(X1,X2, . . . ,Xniﬁ = lim21°g(P(X1’X2"'~XnD" = 2ljm % 21°90“) as.
= 23(10ECP(X))) as.
= 2—H“) ae. by the strong law of large numbers (assuming of course that H(X) exists). g. 7. The AEP and source coding. A discrete memoryless source emits a sequence of
statistically independent binary digits with probabilities 39(1) 2 0.005 and p(0) =
0.995. The digits are taken 100 at a time and a binary codeword is provided for every
sequence of 100 digits containing three or fewer ones. (a) Assuming that all codewords are the same length, ﬁnd the minimum length re—
quired to provide codewords for all sequences with three or fewer ones. (b) Calcuiate the probability of observing a source sequence for which no codeword
has been assigned. (c) Use Chebyshev’s inequality to bound the probability of observing a source sequence
for which no codeword has been assigned. Compare this bound with the actual
probability computed in part (b). Solution: The AEP and source coding. (a) The number of 100—bit binary sequences with three or fewer ones is 100 100 100 100
0 + 1 + ‘2 + 3 =1+100+4950+161700=166751. The required codeword length is {logg 166751] 2 18. {Note that H(0.005) =
0.0454, so 18 is quite a bit larger than the 4.5 bits of entropy.) (b) The probability that a 100—bit sequence has three or fewer ones is ‘1 (100
L 1. ){0005)5(0995)1{’°‘i : 0.60577 + 0.30441 + 0.7579 + 0.01243 .—_ 0.99833
i=0 Thus the probability that the sequence that is generated cannot be encoded is
l — 0.99833 : 0.00167. (0) In the case of a random variable .371 that is the sum of n i.i.d. random variables
X1, X2, . . . ,Xn ,' Chebyshev’s inequality states that where ,u and 02 are the mean and variance of Xi. (Therefore no and 77.02 are the mean and variance of 3”.) In this problem, in = 100, p. I 0.005, and
0'2 : {0;005)(0;995). Note that 3100 Z 4 if and only if [5103 — 100(02005N 2 3.5,
so we should choose 5 = 3.5. Then 100031105) (0.995) m 0.04061 .
(3.5)2 Pr(Smo Z 4) S This bound is much larger than the actual probability 0.00167. 4, 3. Shuﬁies increase entropy. Argue that for any distribution on shufﬂes T and any
distribution on card positions X that H(TX) 2 H(TX§T) (4.11)
= H(T'1TX[T) (4.12)
= H(XT) (4.13)
= H(X), (4.14) if X and T are independent. Solution: Shuﬁies increase entropy. H(TX) 3 H(TX[T) (4.15)
: H(T‘1TXT) (4.16)
: H(XLT) (4.17)
= H(X). (4.18) The inequality follows from the fact that conditioning reduces entropy and the ﬁrst
equality follows from the fact that given T, we can reverse the shufﬂe. if" 6. Monotonicity of entfopy per element. For a stationary stochastic process X1, X2, . . . ,Xn, Show that
(a)
H{X1,X2,...,X,,) < H(X1,X2,...,Xn_1) (4.51)
n _ n — 1
(b) ,
“X " I“ 1’3": ”X?” 2 H<Xnx _1_,,X1). (4.52) Solution: M anotonécity of entropy per element. (a) By the chain rule for entropy,
H(X1,X2, . . . ,Xn) 3 1;, H(X,1X%'1) (4 53)
n n _ I
I mam1) + 23;,1H(X,X1~1) (4 54)
n
: H{anXn—l)+H(X12X27'1Xn—1) (455)
t n I 
From stationarity it follows that for all 1 g i g n,
H(X,,§X“’1) S H(X«;!Xi‘1),
which further implies, by averaging both sides, that,
'r‘tﬁl _Xi«l
mmxnﬂ : i=1 if? ) (4.56)
= H(X1,X2,...,Xn_1) (4 57)
n a, l I . Combining (4.55) and (4.57) yields, HX X X, HX,X,...,X,,,_
( 1} 2, 1 ) S ’3;[ ( 1 2 1)+H(X13X21"'3Xﬂ—1)
n n n—l w; M (458) n—l (b) By stationarity we have for all 1 g i g n ,
H(anX”*1) g H(X,Xi—1), which implies that
Elli H(X,,1X“”1) ngnlxntl) = n (4.59)
s w (4.60) TL
: H(X1:X23)Xn). (4:61) 71 z}. 7. Entropy rates of Markov chains. (a) Find the entropy rate of the twostate Markov chain with transition matrix P = 1 4 p01 P01 .
P10 1 — P10 (b) What values of 1301,1010 maximize the rate of part (a)?
(c) Find the entropy rate of the twostate Markov chain with transition matrix _ 1 — r r
P _ l 1 O l .
(d) Find the maximum value of the entropy rate of the Markov chain of part (c) We expect that the maximizing value of p should be less than 1/2, since the 0 state
permits more information to be generated than the 1 state. (e) Let N (t) be the number of allowable state sequences of length if for the Markov
' chain of part (c). Find N (t) and calculate 1
H0 2 11m —logN(t). t—>oo t
Hint. Find a linear recurrence that expresses N (t) in terms of N (t i 1) and NOE — 2). Why is Hg an upper bound on the entropy rate of the l‘ﬂarkov chain?
Compare Hg with the maximum entropy found in part (d). Solution: Entropy rates of M arkov chains. (a) The stationary distribution is easily calculated. (See EIT pp. 62763.) P10 P01 =——‘—’ 0:'—""——
MO 190141010 p P01+P10 Therefore the entropy rate is P10H(P01) + 1901 H (1910) 2 H = ———————
H(X2iX.l) 1101270901) + #1 (P10) 1901+ 19:0 (b) The entropy rate is at most 1 bit because the process has only two states. This
rate can be achieved if (and only if) p01 2 p10 '74 1/2, in which case the process is
actually i.i.d. with Pr(X.; : 0) = Pr(Xi = 1) = 1/2. (c) As a special case of the general twostate Markov chain, the entropy rate is HO?) H(X21X1) = #OHW + M1H(1)= 19+ 1 (d) By straightforward calculus, we ﬁnd that the maximum value of H (X) of part (6)
occurs for p = (3 — \/_5_)/2 = 0.382. rI‘he maximum value is H(p) = H(1 — p) = H («—2— I) = 0.694 bits. Note that 5/5 — 1) /2 2 0.618 is (the reciprocal of) the Golden Ratio. (e) The Markov chain of part (c) forbids consecutive ones. Consider any allowable
sequence of symbois of length t. If the ﬁrst symbol is 1, then the next symbol
must be 0; the remaining N (t — 2) symbols can form any allowable sequence. If
the ﬁrst symbol is 0, then the remaining N (t r 1) symbols can be any allowable
sequence. So the number of allowable sequences of length 15 satisﬁes the recurrence Ne) : N(t — 1) + N0: — 2) N(1) = 2, M2} : 3 (The initial conditions are obtained by observing that'for t = 2 only the sequence
11 is not allowed. We could also choose N (0) = 1 as an initial condition, since
there is exactly one allowable sequence of length 0, namely, the empty sequence.)
The sequence N (t) grows exponentially, that is, N (t) m CH, where A is the
maximum magnitude solution of the characteristic equation 1=z_1+z_2. Solving the characteristic equation yields A = (1+ x/g)/2, the Golden Ratio. (The
sequence {N (t)} is the sequence of Fibonacci numbers.) Therefore an = lim %logN(t) = log(l + x/S)/2 : 0.694 bits. ‘11—’00 Since there are only N (if) possible outcomes for X1, ,Xt, an upper bound on
H(X1,. .. , X;) is log N (t) , and so the entropy rate of the Markov chain of part (c)
is at most H0. In fact, we saw in part (d) that this upper bound can be achieved. 4. 15. Entropy rate. Let {X.,} be a discrete stationary stochastic process with entropy rate HOV). Show 1 _ .
gH(X,,,...,X1  X0,X_1,...,ka) —> 57(2), (4.89) for k=1,2,.... Solution: Entropy rate of a stationary process. By the Cesaro mean theorem, the
running average of the terms tends to the same limit as the limit of the terms. Hence 1 l n
EH(X1,X2,...,anX0,X_1,...,X_k) : EZH(X1lX—1:Xiﬁ21m:X‘vté4'90)
i=1
we limH(XnXn,1, Xn_2, . . . ,X_k{}4.91)
= H, (4.92) the entropy rate of the process. ...
View
Full Document
 Summer '10
 M.R.Soleymani

Click to edit the document details