{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Solution_1&2[1]

# Solution_1&2[1] - Entropy Relative Entropy and...

This preview shows pages 1–7. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Entropy, Relative Entropy and Mutual-Information 11 3. Minimum entropy. What is the minimum value of H (m, m, p”) = H (p) as p ranges _ over the set of n—dimensional probability vectors? Find all p’s which achieve this minimum. Solution: We wish to ﬁnd all probability vectors p = (p1,p2, , . . , 33”,) which minimize H(p) = v 219'; logpi- Now #pi log pi- 2 0, with equality iii 19;- : 0 or 1. Hence the only possible probability vectors which minimize H (p) are those with p"; = l for some i and 395; : 0, j 75 i. There are n such vectors, i.e., (1,0,...,0), (0,1,0,...,0), ..., (0,... ,0,1}, and the ' minimum value of H(p‘) is 0. - ' 4. Entropy of functions of a random variable. Let X be a discrete random variable. Show that the entropy of a function of X is less than or equal to the entropy Of X by justifying the following steps: mxam)@.mm+HeWMX) on @.th om MXMM)@ mamrwmnan) as Q man» on Thus H(g(X)) g m). Solution: Entropy of functions of a random variable. (a) H(X, g(X)) z H(X) + H(g(X)!X) by the chain rule for entropies. (b) H (g(X )]X ) = 0 since for any particular value of X, g(X) is ﬁxed, and hence nglelX) = EmP(\$)HCQ(X)lX : 93) I 2x0 = 0- 7(c) H(X,g[X)) = H(g(X)) + H(Xlg(X)) again by the chain rule. _ (d) H(Xig(X)) 2 0, with equality iff X is a function of 9(X) , i.e., g(.) is one—towone. Hmmaemanzamxn Combining parts (b) and (d), we obtain H(X) 2 H (g[X )) 5. Zero conditional entropy. Show that if EEG/IX) = 0, then Y is a function of X , i.e., for all a: with p(m) > 0, there is only one possible value of y with p(:c,y) > 0. Solution: Zero Conditional Entropy. Assume that there exists an 33, say \$9 and two different values of y, say yl and y2 such that p(:co,y1) > O and p(mo,y2) > 0. Then 390710) 2 lenayll +Plﬂioay2) > 0: and 10(911330) and P£y2lwol are not equal ’60 0 0r 1- Thus H(YlX) = -Zp(w)Zp{ylm)losP(yl\$) (2-5) :1: _y 2 p(:co)(-p(y1|\$o) 10s myllxo) — plwlmo) lospwzlmoﬁ (25) > > 0, (2.7) atsmminwswm‘mwsmmisamwtwvm' 'JsmssammmﬂmsNHMMW' ‘ 'mmsni'nﬁl-wmﬁssz‘ga' mum * ' » -«‘ ‘ anus»... : h' ‘ - y . - m ‘ Juan. A . Himmlnssmmmatnwlnu-A- 5-:WW:mmwzmnummmmsimﬁvmﬁéﬁaﬁmmmﬁeﬁm:mwwmxm .‘ " ‘ ‘ 12 Entropy, Relative Entropy and Mutual Information since —t logt 2 0 for 0 g t _<_ l, and is strictly positive for t not equal to U or 1. Therefore the conditional entropy HCYIX) is 0 if and only if Y is a function of X . 6. Conditional mutual information vs. unconditional mutual information. Give examples of'joint random variables X , Y and Z such that (a) I(X;YEZ)<I(X;Y), (b) rpm/g2) >I(X;Y). Solution: Conditional mutual information vs. unconditional mutual information. (a) The lastcorollary' to Theorem 2.8.1 in the text States that if X '—> Y —+ Z that is, if p(a:,y { z) = p(:t' I z)p(y] z) then, I(X;Y) 2 I(X,Y | Z). Equality holds if and only if I (X; Z) = 0 or X and Z are independent. A simple example of random variables satisfying the inequality conditions above is, X is a fair binary random variable and Y z X and Z' = Y. In this case, I(X; Y) = H(X) — H(X 1 Y) = nor} a 1 and, I(X;YIZ) =H(X| Z)—H(X\Y,Z}=0. So that I(X;Y) > I(X;Y i Z). (b) This example is also given in the text. Let X,Y be independent fair binary random variables and let Z = X + Y. In this case we have that, I(X;Y)=O and, I(X;Y | Z) =H{X | Z) = 1/2. So I(X;Y) < I(X; Y I Z). Note that in this case X,Y, Z7 are not markov. 10. Entropy of a disjoint mixture. Let X1 and X5 be discrete random variables drawn according to probability mass functions p10) and p2(-) over the respective alphabets X1 = {1,2,.l.,m} and X2: {m+ 1,...,n}. Let X # X1, with probability a, — X2, with probability 1 — or. 16 - Entropy, Relative Entropy and Mutual Information (5.) Find HUI) in terms of H(X1) and H(X2) and 0:. (b) Maximize over 0: to show that 2H“) S 2110(1) +'2H(X3) and interpret using the notion that 2H(X) is the effective alphabet size. Solution: Entropy. We can do this problem by writing down the deﬁnition of entropy and expanding the various terms. Instead, we will use the algebra of entropies for a simpler proof. Since X1 and X2 have disjoint support sets, we can write X— X1 with probability a _ X2 with probability 1—o/ Deﬁne a function of X , _ _ 1 whenX=X1 6—f(X)—{2 whenX=X2 Then as in problem 1, we have H00 = HUG leﬂ = 43(9) + H(Xl9) He) we = 1)H(Xl6 : 1) we = amine = 2) = HM) + oeH(X1) + (1 — a)H(X2) 1| where H(a) 11'. A measure of correlation. Let X1 and X2 be identically distributed, but not necessarily independent. Let —o:logo: — (1 — a) log{1 - a). _ H(X2|X1) p‘l‘W‘ :X (a) Show p : —Ig§1fl (b) Show 0 S p S 1. (c) When is p = 0? (d) When is p z 1'? Solution: A measure of correlation. X1 and X2 are identically distributed and H(X1X> P: 1 ‘ Hdml (a) _ W P — H(X1) i H(X)—H{X|X‘) - — _ __2H_(X_1_)_2___1_ (Since H(X1) -— 130(2)) I(X1;X2) H(X1} ' W Entropy, Relative Entropy and Mutual Information ' 17 (‘0) Since 0 S H(X2lX1) S H(X2) : H(X1), we have H(X2|X1) < 1 Hlel _ USPSL as (c) p = 0 iff I(X1;X2) = 0 iff X1 and X2 are independent. ((1) ,0 t 1 iff H(X2|X1) = 0 iff X2 is a function of X1. By symmetry, X1 is a function of X2, i.e., X1 and X2 have a one—to—one relationship. 12. Example of joint entropy, Let p(a:,y) be given by Find (a) H(X):H(Y}- (b) H(X]Y),H(Y1X). (c) H (X, Y). ((1) H(Y)—H(Y1Xl ®)KXﬂW (f) Draw a Venn diagram for the quantities in (a) through (e). Solution: Example of joint entropy @)[email protected]=\$%§+g%3:amsmm:num (b) H(X!Y) 2 1311(le = 0) + gnaw! : l) = 0.667 bits = H(Y|X). {QJKXJU:3x§%3=L%5Mm (d) H(Y) — H(YlX) : 0.251 bits. (e) I(X;Y) = H(Y) — H(YIX) = 0.251 bits. (f) See Figure 1. 13. Inequality. Show in a: 2 l —— i for a: > 0. Solution: Inequality. Using the Remainder form of the Taylor expansion of 111(12) about :1: = 1, we have for some a between 1 and :1: 18 Entropy, Relative Entropy and Mutual Information Fignre 2.1: Venn diagram to illustrate the relationships of entropy and relative entropy H(Y) since the second term is always negative. Hence letting y = 1/33, we obtain 1 -myS——1 y or 1 lny 2 1 — — , y with equality iﬁ y = 1. h 14. Entropy of a sum. Let X and Y be random variables that take on values \$1, \$2, . . . ,mr and y17y21"' 33,15, respectively. Let Z = X _;_ y (a) Show that H[ZlX) = H(Y|X). Argue that if X,Y are independent, then HO”) 5 H(Z) and H(X) g H(Z) Thus the addition of independent random variables adds uncertainty. (b) Give an example of (necessarily dependent) random variables in which H(X) > H(Z) and H(Y) > H(Z). (0) Under What conditions does H(Z) : H(X) + H(Y}? Solution: Entropy of a. sum. (a) Z=X+Y. Hence p(Z=z|X =\$}:p(Y =z»+55'[X=:t). H(ZIX) = 2229:thle = as) :f—Zﬂ@[email protected]=dX=ﬂbww=dX=ﬂ Zp(\$)Zp(Y = z—le:\$)logp(Y:zi:ch =sc) z y . =‘ZMMMWX=ﬂ = HWR} Entropy, Reiative Entropy and Mutual Information . '19 If X and Y are independent, then H(Y|X) = H(Y). Since I(X;Z) 2 0, we have H(Z) 2 H(Z|X) = H(Y|X) 2 HO”) . Similarly we can show that H(Z) 3 H(X). (b) Consider the following joint distribution for X and Y Let 1 with probability 1/2 X z —Y 2 i 0 with probability 1/2 Then H(X) 2 HO”) =1, but Z 2 0 with prob. 1 and hence H(Z) = O. (c) We have Ht?) 5 Her, Y) s Hm + Hm because Z is a function of (X,Y)r and HOLY): H(X) + H(YlX) 5 H(X) + H0”) We have equality iff (X, Y) is a function of Z and H(Y) = H(Y[X), 1.e., X and Y are independent. 28. Mixing increases entropy. Show that the entrOPY 0f the Probability diStribU‘ﬂOm (P1, . . . ,Pz‘,. . . ,pj,. . - 1pm}, is lessthan the entropy 0f the distribution . . (p1, . . . , E1531, . . . , E521, . , . , Pm) ,rlShow that in general any transfer of probability that makes the distribution more-umbrm increases the entropy. Solution: Mixing increases entropy. This problem depends on the convexity of the log function. Let P1 = (plan-apiwuzpja'uapm) P2 = (plv'napz—i—pja'uap‘j pﬁin'ipm) _ 2 , 2 . Then, by the log sum inequality,‘ .+ . .+ . H(P2)HH(P1) 2 —2(pz2p3)log(p12p3)+pilogp,+pjlogpj .+ . = -£m+pj)10s(pi 2 p” )‘l‘Pi 103m +105; 10ng IV 0. Thus, H{P2) 2 111(1):). 29. Inequalities. Let. X, Y and Z be joint-random variables.- Prove the following inequalities and ﬁnd conditions for equality. (a) H(X, 1/12) 2H(X]Z}. (b) RX. n2) 2 I(X; 2). . (c) H(X, Y, z) — H(X, Y) 5 H(X, Z) — H(X). (d) I(X;Z[Y) Z I(Z;Y[X) — I(Z;Y)+ I(X;Z). Solution: Inequalities. (a) Using the chain rule for conditional entropy, HUM/12) = 11(er + Ira/IX, 2) 2 Home, with equality iff H(Y|X, Z) = O, that is, when Y is a function of X and Z. (b) Using the chain rule for mutual information, ax. Y;Z) = 10:32) + Ice 21X) 2 RX; 2), with equality iff I(Y;Z|X) = O, that is, when Y and Z are conditionally inde- pendent given X. (c) Using ﬁrst the chain rule for entropy and then the deﬁnition of conditional mutual information, nor, Y, Z) — H(X,Y) H(Z|X, Y) = H(Z|X) a 10’; ZIX) H(Z|X) : nor, Z) w non, ll 1/\ with equality iff I (Y; Z {X ) 2 0, that is, when Y and Z are conditionally inde— pendent given X. (d) Using the chain rule for mutual information, I(X;Z1Y)+I(Z;Y) =I(X,Y;Z) = I(Z;Y|X) +I(X;Z), and therefore I(X;Z|Y) : I(Z;Y|X) —I(Z;Y) +I(X;Z). We see that this inequality is actually an equality in all cases. , 33. Fano’s inequality. Let Pr(X = 2') = phi = 1,2,..._,m andﬂlet pl 2 p2 2 p3. Z 2 pm. The minimal probability of error predictor of X is X = l ,Iw1th resultlng probability 0f error Pg 2 1 ~p1. Maximize H (p) subject to the constramt 1 ~ 191 2 Pa to ﬁnd a bound on P6 in terms of H . This is Fano’s inequality in the absence of conditioning. Solution: {Fano’s Inequality.) The minimal probability of error predictor when there is no information is X = l, the most probable value of X . The probability of error in this case is P6 = l — p1 . Hence if we ﬁx P,E , we ﬁx 191. We maximize the entropy of X for a given. P3 to obtain an upper bound on the entropy for a given P5 . The entropy, m 3(1)) = -p110gp1 A Pi EOng' (2.62) '=2 2m 29- 10- ~ —p1 103231 — Zing—*1 g i A P, logPe (2.63) i=2 Re P6 P2 P3 Pm = H P — —— . — 2.64 (an eHlPe’P.‘ ’Pe) ( l S RUDE) + R, iog(m — 1), (2.65) since the maximum of H (jg—i, %, . . . , gs?) is attained by an uniform distribution. Hence any X that can be predicted with a probability of error P8 must satisfy H(X) S H(Pe) + 1313 log(m — 1), _ (2.66) which is the unconditional form of Fano’s inequality. We can weaken this inequality to obtain an explicit lower bound for P6, 13> H(X)—1 e _ m- (2.67) ...
View Full Document

{[ snackBarMessage ]}

### Page1 / 7

Solution_1&2[1] - Entropy Relative Entropy and...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online