Information-Theory

# Information-Theory - Machine Learning Srihari Information...

This preview shows pages 1–8. Sign up to view the full content.

Machine Learning Srihari 1 Information Theory Sargur N. Srihari

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Machine Learning Srihari 2 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy 3. Conditional Entropy 4. Kullback-Leibler Divergence (Relative Entropy) 5. Mutual Information
Machine Learning Srihari 3 Information Measure How much information is received when we observe a specific value for a discrete random variable x ? Amount of information is degree of surprise – Certain means no information – More information when event is unlikely Depends on probability distribution p(x), a quantity h(x) If there are two unrelated events x and y we want h(x,y)= h(x) + h(y) Thus we choose h(x)= - log 2 p(x) – Negative assures that information measure is positive Average amount of information transmitted is the expectation wrt p(x) refered to as entropy H(x)=- Σ x p(x) log 2 p(x)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Machine Learning Srihari Uniform Distribution – Random variable x has 8 possible states, each equally likely We would need 3 bits to transmit Also, H(x) = - 8 x (1/8)log 2 (1/8)=3 bits Non-uniform Distribution – If x has 8 states with probabilities (1/2,1/4,1/8,1/16,1/64,1/64,1/64,1/64) H(x)=2 bits Non-uniform distribution has smaller entropy than uniform Has an interpretation of in terms of disorder 4 Usefulness of Entropy
Machine Learning Srihari 5 Relationship of Entropy to Code Length Take advantage of non-uniform distribution to use shorter codes for more probable events If x has 8 states ( a,b,c,d,e,f,g,h ) with probabilities (1/2,1/4,1/8,1/16,1/64,1/64,1/64,1/64) Can use codes 0,10,110,1110,111100,111110,111111 average code length = (1/2)1+(1/4)2+(1/8)3+(1/16)4+4(1/64)6 =2 bits Same as entropy of the random variable Shorter code string is not possible due to need to disambiguate string into component parts 11001110 is uniquely decoded as sequence cad 1/2 1/4 1/8 0 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Machine Learning Srihari Relationship between Entropy and Shortest Coding Length • Noiseless coding theorem of Shannon – Entropy is a lower bound on number of bits needed to transmit a random variable • Natural logarithms are used in relationship to other topics – Nats instead of bits 6
Machine Learning Srihari History of Entropy: thermodynamics to information theory • Entropy is average amount of information needed to specify state of a random variable • Concept had much earlier origin in physics – Context of equilibrium thermodynamics – Later given deeper interpretation as measure of disorder (developments in statistical mechanics) 7

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This document was uploaded on 02/25/2012.

### Page1 / 24

Information-Theory - Machine Learning Srihari Information...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online