This preview shows page 1. Sign up to view the full content.
Unformatted text preview: 1 EE670 Information Theory and Coding
Class home page: http://koala.ece.stevenstech.edu/~utureli/EE670/
n Class information, assignments and other links. Text Elements of Information Theory, Thomas Cover and Joy A Thomas, Wiley, 1991. § Entropy, Data Compression, Rate Distortion Theory § Ch. 2 14 § No class next Thursday 1/24/02,(Monday schedule)
n 2 DIGITAL SIGNAL
n n DISCRETE WAVEFORM TWO DISCRETE STATES:
n n 1BIT & 0BIT ON / OFF PULSE n n DATA COMMUNICATION USES MODEM TO TRANSLATE ANALOG TO DIGITAL, DIGITAL TO ANALOG *
0010111010011101001010101110111100100010000101111010110
3 LEARNING OBJECTIVES
n Information Theory
n Biotech, Communications, Security, Finance DNA molecule, Compression , Coding, Investment, Gambling Entropy and mutual information Source coding Channel capacity, noisy channel coding theorem Ratedistortion theorem n Uncertainty, mutual information
n n Topics
n n n n 4 Fundamental Limits of Communication
n n n n n Purpose of communication system: carry information bearing signals. Entropy: irreducible complexity below which a signal cannot be compressed Capacity:intrinsic ability of a channel to transmit information. If capacity of a channel exceeds entropy rate of the source, than lossless communication is possible! Rate Distortion determines accuracy in the reconstructed signal as a function of the number of bits used (i.e. the rate)?
5 Random Variables and Probability
§ Random variable X assumes a value as a function from outcomes of a process which can not be determined in advance. § Sample space S of a random variable is the set of all possible values of the variable X.
§ Ω: set of all outcomes and divide it into elementary events, or states å p( x) = 1
{ x} 1 ≥ p( x) ≥ 0 6 Expectation, Variance and Deviation
The moments of a random variable define important characteristics of random variables: The first moment is the expectation µ: Note: The expectation has a misleading name and is not always the value we
expect to see most. In the case of the number on a dice the expectation is 3.5 which is not a value we will see at all!. The expectation is as a weighted average. The variance is defined by Var[x] = <x2>  <x>2 = M2  M12. The standard deviation σ = Var[x]1/2 evaluates the “spread factor”or x in relation to the mean. 7 Conditional probability and Bayes’ Rule
The knowledge that a certain event occurred can change the probability that another event will occur. P(xy) denotes the conditional probability that x will happen given that y has happened. Bayes’ rule states that: The complete probability formula states that P(A) = P(AB)P(B) + P(A ¬ B)P(¬B) or in the more general case Note: P(A) = αP(AB) + (1 α)P(A¬B) mirrors our intuition that the
unconditional probability of an event is somewhere between it’s conditional probability based on two opposing assumptions. 8 Information Theory: Entropy
We want a fundamental measure that will allow us to gauge the amount of missing information in a distribution, or a set of possible states. A measure which gives us such an estimate is the entropy: H[P] ≥ 0 (and is 0 when there is one possible outcome) KH[P] is concave: H[αP1+(1α)P2] ≥ αH[P1]+(1+α)H[P2] KThe entropy get it’s maximal value when there is the most uncertainty. This intuitively occurs where p(x) is uniform (we do not prefer any state to the other).
K 9 Entropy (2)
The joint entropy is defined by: H [ X , Y ] = − å å p ( x , y ) log p ( x , y ) = − E [log p ( X , Y )]
x y The conditional entropy is defined by: H [Y  X ] = å p ( x) H (Y  X = x) = − å p ( x)å p ( y  x) log p ( y  x)
{ x} { x} { y} = −åå p ( x, y ) log( p ( y  x))
{ x} { y } The relative entropy (or the “distance between two distributions) is: D[ p  q ] = å p ( x) log
{ x} p ( x) q( x) and equals 0 only if p(x) and q(x) are the same. .
10 Mutual Information
The mutual information between two random variables is: p ( x, y ) I ( X ; Y ) = åå p ( x, y ) log p( x) p( y ) { x} { y } = D[ p( x, y )  p( x) ⋅ p ( y )] = H ( X ) − H ( X  Y ) = H ( X ) + H (Y ) − H ( X , Y ) = I (Y ; X )
K K K K The information of X on Y is same as of Y on X. I(X;Y)=0 if X and Y are independent The mutual missing information H(X,Y) is the missing information of X and the conditional missing information of Y given that X is known The information of Y on X is the improvement of missing information about X once Y is known. 11 Information
The mutual information between two random variables is: p ( x, y ) I ( X ; Y ) = åå p ( x, y ) log p( x) p( y ) { x} { y } = D[ p( x, y )  p( x) ⋅ p ( y )] = H ( X ) − H ( X  Y ) = H ( X ) + H (Y ) − H ( X , Y ) = I (Y ; X )
K K K K The information of X on Y is same as of Y on X. I(X;Y)=0 if X and Y are independent The mutual missing information H(X,Y) is the missing information of X and the conditional missing information of Y given that X is known The information of Y on X is the improvement of missing information about X once Y is known. 12 Chain Rule
H ( X1 , X 2 ) = H ( X1 ) + H ( X 2  X1 ) H ( X1 , X 2 , X 3 ) = H ( X1 ) + H ( X 2 , X 3 X1 ) H ( X1 , X 2 , X 3 ) = H ( X1 ) + H ( X 2  X1 ) + H ( X 3 X 2 , X1 ) H ( X 1 , X 2 , L , X n ) = H ( X 1 ) + H ( X 2  X 1 ) + L + H ( X n  X n −1 , L , X 1 ) n = å H ( X i  X i −1 ,L, X1 ). i =1 13 Jensen’s Inequality
A function f(x) is convex over (a,b) , if ∀x1 , x2 ∈ (a, b ) 0 ≤ λ ≤ 1, f (λx1 + (1 − λ ) x2 ) ≤ λf ( x1 ) + (1 − λ ) f ( x2 )
K K A function is concave if –f is convex. If f convex: x 2 , e x , log( x ) Ef ( X ) ≥ f ( EX ). K Proof: p f (x1)+ p2 f (x2) ≥ f (p1x1 + p2x2), 1
pi =
' pi 1− pk i = 1, 2 , L k − 1 K Assume true for k1 mass points, let , å p f (x ) = p
i =1 k i i k f ( xk ) + (1 − pk )å pi' f ( xi ) k
i =1 k −1 å p f (x ) ≥ p
i =1 i i k −1 i =1 ' i i k f ( xk ) + (1 − pk ) f ( å pi' xi )
i =1 k −1 i =1 k k k ' ii k −1 å p f ( x ) ≥ f ( p x ) + (1 − p ) f (å p x ) 14 Jensen’s Inequality
K K Prove Hint: EX 2 ≥ ( EX ) 2 . f (X ) = X 2 K Assume true for k1 mass points, let p f (x1)+ p2 f (x2) ≥ f (p1x1 + p2x2), 1
pi =
' pi 1− pk i = 1, 2 , L k − 1 , å p f (x ) = p
i =1 k i i k f ( xk ) + (1 − pk )å pi' f ( xi ) k
i =1 k −1 å p f (x ) ≥ p
i =1 i i k −1 i =1 ' i i k f ( xk ) + (1 − pk ) f ( å pi' xi )
i =1 k −1 i =1 k k k ' ii k −1 å p f ( x ) ≥ f ( p x ) + (1 − p ) f (å p x ) 15 Data Processing Inequality
If
K forms a Markov chain, then I ( X ;Y ) ≥ I ( X ; Z ) Proof: Chain rule, expand mutual information I(X; Y;Z) in two ways:
X →Y → Z I ( X ;Y , Z ) = I ( X ;Y ) + I ( X ; Z  Y ) = I ( X ;Y + +0 = I ( X ;Y ) I ( X ;Y , Z ) = I ( X ; Z ) + I ( X ;Y  Z ) ≥ I ( X ; Z ) 16 ...
View
Full
Document
This note was uploaded on 10/12/2009 for the course EE EE 670 taught by Professor Uftureli during the Spring '05 term at Stevens.
 Spring '05
 UfTureli

Click to edit the document details