MIT16_36s09_lec03

MIT16_36s09_lec03 - MIT OpenCourseWare http://ocw.mit.edu...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: MIT OpenCourseWare http://ocw.mit.edu 16.36 Communication Systems Engineering Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 16.36: Communication Systems Engineering Lecture 3: Measuring Information and Entropy February 10, 2009 Eytan Modiano Eytan Modiano Slide 1 Information content of a random variable (how much information is in the data?) • Random variable X – Outcome of a random experiment – Discrete R.V. takes on values from a finite set of possible outcomes PMF: P(X = y) = Px(y) • How much information is contained in the event X = y? – Will the sun rise today? Revealing the outcome of this experiment provides no information – Will the Celtics win the NBA championship? It’s possible - but not certain Revealing the answer to this question certainly has value - I.e., contains information • Eytan Modiano Slide 2 Events whose outcome is certain contain less information than even whose outcome is in doubt Measure of Information • I(xi) = Amount of information revealed by an outcome X = xi • Desirable properties of I(x): 1. 2. 3. 4. If P(x) = 1 or P(x) = 0, then I(x) = 0 If 0 < P(x) < 1, then I(x) > 0 If P(x) < P(y), then I(x) > I(y) If x and y are independent events then I(x,y) = I(x)+I(y) • Above is satisfied by: I(x) = Log2(1/P(x)) • Base of Log is not critical – Eytan Modiano Slide 3 Base 2 ⇒ information measured in bits Entropy • A measure of the information content of a random variable • X ∈ {x1,…,xM} • H(X) = E[I(X)] = ∑P(xi) Log2(1/P(xi)) • Example: Binary experiment – – X = x1 with probability p X = x2 with probability (1-p) – H(X) = pLog2(1/p) + (1-p)Log2(1/(1-p)) = Hb(p) – H(X) is maximized with p=1/2, Hb(1/2) = 1 Not surprising that the result of a binary experiment can be conveyed using one bit Eytan Modiano Slide 4 Simple bounds on entropy • Theorem: Given a random variable with M possible values – 0 ≤ H(X) ≤ Log2(M) A) H(X) = 0 if and only if P(xi) = 1 for some i B) H(X) = Log2(M) if and only if P(xi) = 1/M for all i – Proof of A is obvious Y=x-1 – Proof of B requires the Log Inequality: – if x > 0 then ln(x) ≤ x-1 – Eytan Modiano Slide 5 Equality if x=1 Y= ln(x) Proof, continued M M 1 1 1 Consider the sum Pi Log( )= PiL n ( ) , by log inequality: M Pi ln(2) M Pi i=1 i =1 ! " 1 ln(2) ! M M ! ! 1 1 1 1 Pi ( #1 ) = ( # Pi) = 0, equality when Pi = M Pi ln(2) M M i=1 i =1 Writing this in another way: M M ! M ! ! 1 1 1 1 PiLog ( )= PiLog( ) + Pi Log( ) " 0,equality when Pi = M Pi Pi M M i =1 i=1 i=1 M That i s, ! i =1 Eytan Modiano Slide 6 1 PiLog ( ) " Pi M ! PiLog (M ) = Log( M) i=1 Joint Entropy Joint entropy: H ( X , Y ) = ! p( x, y) log( x ,y 1 ) p( x , y) Conditional entropy: H(X | Y) = uncertainty in X given Y H( X | Y = y) = ! p( x | Y = y) log( x H ( X | Y ) = E[ H( X | Y = y)] = 1 ) p( x | Y = y) ! p(Y = y)H ( X | Y = y) y H ( X | Y) = ! p( x, y) log( x ,y 1 ) p( x | Y = y) In General : X 1, ...,X n random variables H(X n | X 1,...,X n- 1) = Eytan Modiano Slide 7 ! p(x1,...,xn ) log( x1 ,...,xn 1 p( xn | x1,...,xn- 1) Rules for entropy 1. Chain rule: H(X1, .., Xn) = H(X1) + H(X2|X1) + H(X3|X2,X1) + …+ H(Xn|Xn-1…X1) 2. H(X,Y) = H(X) + H(Y|X) = H(Y) + H(X|Y) 3. If X1, .., Xn are independent then: H(X1, .., Xn) = H(X1) + H(X2) + …+H(Xn) If they are also identically distributed (i.i.d) then: H(X1, .., Xn) = nH(X1) 4. H(X1, .., Xn) ≤ H(X1) + H(X2) + …+ H(Xn) (with equality iff independent) Proof: use chain rule and notice that H(X|Y) < H(X) entropy is not increased by additional information Eytan Modiano Slide 8 Mutual Information • X, Y random variables • Definition: I(X;Y) = H(Y) - H(Y|X) • Notice that H(Y|X) = H(X,Y) - H(X) ⇒ I(X;Y) = H(X)+H(Y) - H(X,Y) • I(X;Y) = I(Y;X) = H(X) - H(X|Y) • Note: I(X,Y) ≥ 0 (equality iff independent) – Eytan Modiano Slide 9 Because H(Y) ≥ H(Y|X) ...
View Full Document

This note was uploaded on 11/07/2011 for the course AERO 16.38 taught by Professor Alexandremegretski during the Spring '09 term at MIT.

Ask a homework question - tutors are online