This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Entropy, Relative Entropy and MutualInformation 11 3. Minimum entropy. What is the minimum value of H (m, m, p”) = H (p) as p ranges
_ over the set of n—dimensional probability vectors? Find all p’s which achieve this minimum.
Solution: We wish to ﬁnd all probability vectors p = (p1,p2, , . . , 33”,) which minimize H(p) = v 219'; logpi Now #pi log pi 2 0, with equality iii 19; : 0 or 1. Hence the only possible probability
vectors which minimize H (p) are those with p"; = l for some i and 395; : 0, j 75 i.
There are n such vectors, i.e., (1,0,...,0), (0,1,0,...,0), ..., (0,... ,0,1}, and the ' minimum value of H(p‘) is 0.  '
4. Entropy of functions of a random variable. Let X be a discrete random variable.
Show that the entropy of a function of X is less than or equal to the entropy Of X by justifying the following steps: mxam)@.mm+HeWMX) on
@.th om
MXMM)@ mamrwmnan) as
Q man» on Thus H(g(X)) g m). Solution: Entropy of functions of a random variable. (a) H(X, g(X)) z H(X) + H(g(X)!X) by the chain rule for entropies.
(b) H (g(X )]X ) = 0 since for any particular value of X, g(X) is ﬁxed, and hence
nglelX) = EmP($)HCQ(X)lX : 93) I 2x0 = 0
7(c) H(X,g[X)) = H(g(X)) + H(Xlg(X)) again by the chain rule.
_ (d) H(Xig(X)) 2 0, with equality iff X is a function of 9(X) , i.e., g(.) is one—towone.
Hmmaemanzamxn
Combining parts (b) and (d), we obtain H(X) 2 H (g[X ))
5. Zero conditional entropy. Show that if EEG/IX) = 0, then Y is a function of X ,
i.e., for all a: with p(m) > 0, there is only one possible value of y with p(:c,y) > 0. Solution: Zero Conditional Entropy. Assume that there exists an 33, say $9 and two
different values of y, say yl and y2 such that p(:co,y1) > O and p(mo,y2) > 0. Then
390710) 2 lenayll +Plﬂioay2) > 0: and 10(911330) and P£y2lwol are not equal ’60 0 0r 1 Thus
H(YlX) = Zp(w)Zp{ylm)losP(yl$) (25)
:1: _y
2 p(:co)(p(y1$o) 10s myllxo) — plwlmo) lospwzlmoﬁ (25)
> > 0, (2.7) atsmminwswm‘mwsmmisamwtwvm' 'JsmssammmﬂmsNHMMW' ‘ 'mmsni'nﬁlwmﬁssz‘ga' mum * ' » «‘ ‘ anus»... : h' ‘  y . 
m ‘ Juan. A . HimmlnssmmmatnwlnuA 5:WW:mmwzmnummmmsimﬁvmﬁéﬁaﬁmmmﬁeﬁm:mwwmxm .‘ " ‘ ‘ 12 Entropy, Relative Entropy and Mutual Information since —t logt 2 0 for 0 g t _<_ l, and is strictly positive for t not equal to U or 1.
Therefore the conditional entropy HCYIX) is 0 if and only if Y is a function of X . 6. Conditional mutual information vs. unconditional mutual information. Give
examples of'joint random variables X , Y and Z such that (a) I(X;YEZ)<I(X;Y),
(b) rpm/g2) >I(X;Y). Solution: Conditional mutual information vs. unconditional mutual information. (a) The lastcorollary' to Theorem 2.8.1 in the text States that if X '—> Y —+ Z that
is, if p(a:,y { z) = p(:t' I z)p(y] z) then, I(X;Y) 2 I(X,Y  Z). Equality holds if
and only if I (X; Z) = 0 or X and Z are independent. A simple example of random variables satisfying the inequality conditions above
is, X is a fair binary random variable and Y z X and Z' = Y. In this case, I(X; Y) = H(X) — H(X 1 Y) = nor} a 1 and, I(X;YIZ) =H(X Z)—H(X\Y,Z}=0. So that I(X;Y) > I(X;Y i Z). (b) This example is also given in the text. Let X,Y be independent fair binary
random variables and let Z = X + Y. In this case we have that, I(X;Y)=O and, I(X;Y  Z) =H{X  Z) = 1/2.
So I(X;Y) < I(X; Y I Z). Note that in this case X,Y, Z7 are not markov.
10. Entropy of a disjoint mixture. Let X1 and X5 be discrete random variables drawn according to probability mass functions p10) and p2() over the respective alphabets
X1 = {1,2,.l.,m} and X2: {m+ 1,...,n}. Let X # X1, with probability a,
— X2, with probability 1 — or. 16  Entropy, Relative Entropy and Mutual Information (5.) Find HUI) in terms of H(X1) and H(X2) and 0:.
(b) Maximize over 0: to show that 2H“) S 2110(1) +'2H(X3) and interpret using the
notion that 2H(X) is the effective alphabet size. Solution: Entropy. We can do this problem by writing down the deﬁnition of entropy
and expanding the various terms. Instead, we will use the algebra of entropies for a simpler proof. Since X1 and X2 have disjoint support sets, we can write X— X1 with probability a
_ X2 with probability 1—o/ Deﬁne a function of X , _ _ 1 whenX=X1
6—f(X)—{2 whenX=X2 Then as in problem 1, we have H00 = HUG leﬂ = 43(9) + H(Xl9)
He) we = 1)H(Xl6 : 1) we = amine = 2)
= HM) + oeH(X1) + (1 — a)H(X2) 1 where H(a) 11'. A measure of correlation. Let X1 and X2 be identically distributed, but not
necessarily independent. Let —o:logo: — (1 — a) log{1  a). _ H(X2X1)
p‘l‘W‘ :X
(a) Show p : —Ig§1fl
(b) Show 0 S p S 1.
(c) When is p = 0?
(d) When is p z 1'? Solution: A measure of correlation. X1 and X2 are identically distributed and H(X1X>
P: 1 ‘ Hdml (a)
_ W
P — H(X1)
i H(X)—H{XX‘)  —
_ __2H_(X_1_)_2___1_ (Since H(X1) — 130(2))
I(X1;X2)
H(X1} ' W Entropy, Relative Entropy and Mutual Information ' 17 (‘0) Since 0 S H(X2lX1) S H(X2) : H(X1), we have H(X2X1) < 1
Hlel _ USPSL as (c) p = 0 iff I(X1;X2) = 0 iff X1 and X2 are independent.
((1) ,0 t 1 iff H(X2X1) = 0 iff X2 is a function of X1. By symmetry, X1 is a
function of X2, i.e., X1 and X2 have a one—to—one relationship. 12. Example of joint entropy, Let p(a:,y) be given by Find (a) H(X):H(Y} (b) H(X]Y),H(Y1X). (c) H (X, Y). ((1) H(Y)—H(Y1Xl ®)KXﬂW (f) Draw a Venn diagram for the quantities in (a) through (e). Solution: Example of joint entropy @)[email protected]=$%§+g%3:amsmm:num (b) H(X!Y) 2 1311(le = 0) + gnaw! : l) = 0.667 bits = H(YX).
{QJKXJU:3x§%3=L%5Mm (d) H(Y) — H(YlX) : 0.251 bits. (e) I(X;Y) = H(Y) — H(YIX) = 0.251 bits. (f) See Figure 1. 13. Inequality. Show in a: 2 l —— i for a: > 0. Solution: Inequality. Using the Remainder form of the Taylor expansion of 111(12)
about :1: = 1, we have for some a between 1 and :1: 18 Entropy, Relative Entropy and Mutual Information Fignre 2.1: Venn diagram to illustrate the relationships of entropy and relative entropy H(Y) since the second term is always negative. Hence letting y = 1/33, we obtain 1
myS——1
y
or 1
lny 2 1 — —
, y
with equality iﬁ y = 1.
h 14. Entropy of a sum. Let X and Y be random variables that take on values $1, $2, . . . ,mr and y17y21"' 33,15, respectively. Let Z = X _;_ y (a) Show that H[ZlX) = H(YX). Argue that if X,Y are independent, then HO”) 5
H(Z) and H(X) g H(Z) Thus the addition of independent random variables
adds uncertainty. (b) Give an example of (necessarily dependent) random variables in which H(X) > H(Z) and H(Y) > H(Z).
(0) Under What conditions does H(Z) : H(X) + H(Y}? Solution: Entropy of a. sum.
(a) Z=X+Y. Hence p(Z=zX =$}:p(Y =z»+55'[X=:t).
H(ZIX) = 2229:thle = as)
:f—Zﬂ@[email protected]=dX=ﬂbww=dX=ﬂ Zp($)Zp(Y = z—le:$)logp(Y:zi:ch =sc)
z y . =‘ZMMMWX=ﬂ
= HWR} Entropy, Reiative Entropy and Mutual Information . '19 If X and Y are independent, then H(YX) = H(Y). Since I(X;Z) 2 0,
we have H(Z) 2 H(ZX) = H(YX) 2 HO”) . Similarly we can show that
H(Z) 3 H(X). (b) Consider the following joint distribution for X and Y Let 1 with probability 1/2 X z —Y 2 i 0 with probability 1/2 Then H(X) 2 HO”) =1, but Z 2 0 with prob. 1 and hence H(Z) = O.
(c) We have
Ht?) 5 Her, Y) s Hm + Hm because Z is a function of (X,Y)r and HOLY): H(X) + H(YlX) 5 H(X) +
H0”) We have equality iff (X, Y) is a function of Z and H(Y) = H(Y[X), 1.e.,
X and Y are independent. 28. Mixing increases entropy. Show that the entrOPY 0f the Probability diStribU‘ﬂOm
(P1, . . . ,Pz‘,. . . ,pj,. .  1pm}, is lessthan the entropy 0f the distribution . .
(p1, . . . , E1531, . . . , E521, . , . , Pm) ,rlShow that in general any transfer of probability that
makes the distribution moreumbrm increases the entropy. Solution:
Mixing increases entropy. This problem depends on the convexity of the log function. Let P1 = (planapiwuzpja'uapm)
P2 = (plv'napz—i—pja'uap‘j pﬁin'ipm)
_ 2 , 2 .
Then, by the log sum inequality,‘
.+ . .+ .
H(P2)HH(P1) 2 —2(pz2p3)log(p12p3)+pilogp,+pjlogpj
.+ .
= £m+pj)10s(pi 2 p” )‘l‘Pi 103m +105; 10ng IV 0.
Thus,
H{P2) 2 111(1):). 29. Inequalities. Let. X, Y and Z be jointrandom variables. Prove the following
inequalities and ﬁnd conditions for equality. (a) H(X, 1/12) 2H(X]Z}. (b) RX. n2) 2 I(X; 2). . (c) H(X, Y, z) — H(X, Y) 5 H(X, Z) — H(X).
(d) I(X;Z[Y) Z I(Z;Y[X) — I(Z;Y)+ I(X;Z). Solution: Inequalities.
(a) Using the chain rule for conditional entropy,
HUM/12) = 11(er + Ira/IX, 2) 2 Home, with equality iff H(YX, Z) = O, that is, when Y is a function of X and Z.
(b) Using the chain rule for mutual information, ax. Y;Z) = 10:32) + Ice 21X) 2 RX; 2), with equality iff I(Y;ZX) = O, that is, when Y and Z are conditionally inde
pendent given X. (c) Using ﬁrst the chain rule for entropy and then the deﬁnition of conditional mutual
information, nor, Y, Z) — H(X,Y) H(ZX, Y) = H(ZX) a 10’; ZIX) H(ZX) : nor, Z) w non, ll 1/\ with equality iff I (Y; Z {X ) 2 0, that is, when Y and Z are conditionally inde—
pendent given X.
(d) Using the chain rule for mutual information, I(X;Z1Y)+I(Z;Y) =I(X,Y;Z) = I(Z;YX) +I(X;Z), and therefore
I(X;ZY) : I(Z;YX) —I(Z;Y) +I(X;Z). We see that this inequality is actually an equality in all cases. , 33. Fano’s inequality. Let Pr(X = 2') = phi = 1,2,..._,m andﬂlet pl 2 p2 2 p3. Z
2 pm. The minimal probability of error predictor of X is X = l ,Iw1th resultlng
probability 0f error Pg 2 1 ~p1. Maximize H (p) subject to the constramt 1 ~ 191 2 Pa to ﬁnd a bound on P6 in terms of H . This is Fano’s inequality in the absence of
conditioning. Solution: {Fano’s Inequality.) The minimal probability of error predictor when there
is no information is X = l, the most probable value of X . The probability of error in
this case is P6 = l — p1 . Hence if we ﬁx P,E , we ﬁx 191. We maximize the entropy of X
for a given. P3 to obtain an upper bound on the entropy for a given P5 . The entropy, m
3(1)) = p110gp1 A Pi EOng' (2.62)
'=2
2m 29 10
~ —p1 103231 — Zing—*1 g i A P, logPe (2.63)
i=2 Re P6
P2 P3 Pm
= H P — —— . — 2.64
(an eHlPe’P.‘ ’Pe) ( l
S RUDE) + R, iog(m — 1), (2.65)
since the maximum of H (jg—i, %, . . . , gs?) is attained by an uniform distribution. Hence any X that can be predicted with a probability of error P8 must satisfy
H(X) S H(Pe) + 1313 log(m — 1), _ (2.66) which is the unconditional form of Fano’s inequality. We can weaken this inequality to
obtain an explicit lower bound for P6, 13> H(X)—1 e _ m (2.67) ...
View
Full Document
 Summer '10
 M.R.Soleymani

Click to edit the document details