Ch2 - 1 EE670 Information Theory and Coding Class home page http/koala.ece.stevens-tech.edu/~utureli/EE670 n Class information assignments and

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 EE670 Information Theory and Coding Class home page: http://koala.ece.stevens-tech.edu/~utureli/EE670/ n Class information, assignments and other links. Text Elements of Information Theory, Thomas Cover and Joy A Thomas, Wiley, 1991. § Entropy, Data Compression, Rate Distortion Theory § Ch. 2- 14 § No class next Thursday 1/24/02,(Monday schedule) n 2 DIGITAL SIGNAL n n DISCRETE WAVEFORM TWO DISCRETE STATES: n n 1-BIT & 0-BIT ON / OFF PULSE n n DATA COMMUNICATION USES MODEM TO TRANSLATE ANALOG TO DIGITAL, DIGITAL TO ANALOG * 0010111010011101001010101110111100100010000101111010110 3 LEARNING OBJECTIVES n Information Theory n Biotech, Communications, Security, Finance DNA molecule, Compression , Coding, Investment, Gambling Entropy and mutual information Source coding Channel capacity, noisy channel coding theorem Rate-distortion theorem n Uncertainty, mutual information n n Topics n n n n 4 Fundamental Limits of Communication n n n n n Purpose of communication system: carry information bearing signals. Entropy: irreducible complexity below which a signal cannot be compressed Capacity:intrinsic ability of a channel to transmit information. If capacity of a channel exceeds entropy rate of the source, than lossless communication is possible! Rate Distortion determines accuracy in the reconstructed signal as a function of the number of bits used (i.e. the rate)? 5 Random Variables and Probability § Random variable X assumes a value as a function from outcomes of a process which can not be determined in advance. § Sample space S of a random variable is the set of all possible values of the variable X. § Ω: set of all outcomes and divide it into elementary events, or states å p( x) = 1 { x} 1 ≥ p( x) ≥ 0 6 Expectation, Variance and Deviation The moments of a random variable define important characteristics of random variables: The first moment is the expectation µ: Note: The expectation has a misleading name and is not always the value we expect to see most. In the case of the number on a dice the expectation is 3.5 which is not a value we will see at all!. The expectation is as a weighted average. The variance is defined by Var[x] = <x2> - <x>2 = M2 - M12. The standard deviation σ = Var[x]1/2 evaluates the “spread factor”or x in relation to the mean. 7 Conditional probability and Bayes’ Rule The knowledge that a certain event occurred can change the probability that another event will occur. P(x|y) denotes the conditional probability that x will happen given that y has happened. Bayes’ rule states that: The complete probability formula states that P(A) = P(A|B)P(B) + P(A| ¬ B)P(¬B) or in the more general case Note: P(A) = αP(A|B) + (1- α)P(A|¬B) mirrors our intuition that the unconditional probability of an event is somewhere between it’s conditional probability based on two opposing assumptions. 8 Information Theory: Entropy We want a fundamental measure that will allow us to gauge the amount of missing information in a distribution, or a set of possible states. A measure which gives us such an estimate is the entropy: H[P] ≥ 0 (and is 0 when there is one possible outcome) KH[P] is concave: H[αP1+(1-α)P2] ≥ αH[P1]+(1+α)H[P2] KThe entropy get it’s maximal value when there is the most uncertainty. This intuitively occurs where p(x) is uniform (we do not prefer any state to the other). K 9 Entropy (2) The joint entropy is defined by: H [ X , Y ] = − å å p ( x , y ) log p ( x , y ) = − E [log p ( X , Y )] x y The conditional entropy is defined by: H [Y | X ] = å p ( x) H (Y | X = x) = − å p ( x)å p ( y | x) log p ( y | x) { x} { x} { y} = −åå p ( x, y ) log( p ( y | x)) { x} { y } The relative entropy (or the “distance between two distributions) is: D[ p | q ] = å p ( x) log { x} p ( x) q( x) and equals 0 only if p(x) and q(x) are the same. . 10 Mutual Information The mutual information between two random variables is: p ( x, y ) I ( X ; Y ) = åå p ( x, y ) log p( x) p( y ) { x} { y } = D[ p( x, y ) | p( x) ⋅ p ( y )] = H ( X ) − H ( X | Y ) = H ( X ) + H (Y ) − H ( X , Y ) = I (Y ; X ) K K K K The information of X on Y is same as of Y on X. I(X;Y)=0 if X and Y are independent The mutual missing information H(X,Y) is the missing information of X and the conditional missing information of Y given that X is known The information of Y on X is the improvement of missing information about X once Y is known. 11 Information The mutual information between two random variables is: p ( x, y ) I ( X ; Y ) = åå p ( x, y ) log p( x) p( y ) { x} { y } = D[ p( x, y ) | p( x) ⋅ p ( y )] = H ( X ) − H ( X | Y ) = H ( X ) + H (Y ) − H ( X , Y ) = I (Y ; X ) K K K K The information of X on Y is same as of Y on X. I(X;Y)=0 if X and Y are independent The mutual missing information H(X,Y) is the missing information of X and the conditional missing information of Y given that X is known The information of Y on X is the improvement of missing information about X once Y is known. 12 Chain Rule H ( X1 , X 2 ) = H ( X1 ) + H ( X 2 | X1 ) H ( X1 , X 2 , X 3 ) = H ( X1 ) + H ( X 2 , X 3| X1 ) H ( X1 , X 2 , X 3 ) = H ( X1 ) + H ( X 2 | X1 ) + H ( X 3| X 2 , X1 ) H ( X 1 , X 2 , L , X n ) = H ( X 1 ) + H ( X 2 | X 1 ) + L + H ( X n | X n −1 , L , X 1 ) n = å H ( X i | X i −1 ,L, X1 ). i =1 13 Jensen’s Inequality A function f(x) is convex over (a,b) , if ∀x1 , x2 ∈ (a, b ) 0 ≤ λ ≤ 1, f (λx1 + (1 − λ ) x2 ) ≤ λf ( x1 ) + (1 − λ ) f ( x2 ) K K A function is concave if –f is convex. If f convex: x 2 , e x , log( x ) Ef ( X ) ≥ f ( EX ). K Proof: p f (x1)+ p2 f (x2) ≥ f (p1x1 + p2x2), 1 pi = ' pi 1− pk i = 1, 2 , L k − 1 K Assume true for k-1 mass points, let , å p f (x ) = p i =1 k i i k f ( xk ) + (1 − pk )å pi' f ( xi ) k i =1 k −1 å p f (x ) ≥ p i =1 i i k −1 i =1 ' i i k f ( xk ) + (1 − pk ) f ( å pi' xi ) i =1 k −1 i =1 k k k ' ii k −1 å p f ( x ) ≥ f ( p x ) + (1 − p ) f (å p x ) 14 Jensen’s Inequality K K Prove Hint: EX 2 ≥ ( EX ) 2 . f (X ) = X 2 K Assume true for k-1 mass points, let p f (x1)+ p2 f (x2) ≥ f (p1x1 + p2x2), 1 pi = ' pi 1− pk i = 1, 2 , L k − 1 , å p f (x ) = p i =1 k i i k f ( xk ) + (1 − pk )å pi' f ( xi ) k i =1 k −1 å p f (x ) ≥ p i =1 i i k −1 i =1 ' i i k f ( xk ) + (1 − pk ) f ( å pi' xi ) i =1 k −1 i =1 k k k ' ii k −1 å p f ( x ) ≥ f ( p x ) + (1 − p ) f (å p x ) 15 Data Processing Inequality If K forms a Markov chain, then I ( X ;Y ) ≥ I ( X ; Z ) Proof: Chain rule, expand mutual information I(X; Y;Z) in two ways: X →Y → Z I ( X ;Y , Z ) = I ( X ;Y ) + I ( X ; Z | Y ) = I ( X ;Y + +0 = I ( X ;Y ) I ( X ;Y , Z ) = I ( X ; Z ) + I ( X ;Y | Z ) ≥ I ( X ; Z ) 16 ...
View Full Document

This note was uploaded on 10/12/2009 for the course EE EE 670 taught by Professor Uftureli during the Spring '05 term at Stevens.

Ask a homework question - tutors are online