1-entropy - Harvard SEAS ES250 Information Theory Entropy...

Info icon This preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Harvard SEAS ES250 – Information Theory Entropy, relative entropy, and mutual information * 1 Entropy 1.1 Entropy of a random variable Definition The entropy of a discrete random variable X with pmf p X ( x ) is H ( X ) = - X x p ( x ) log p ( x ) The entropy measures the expected uncertainty in X . It has the following properties: H ( X ) 0, entropy is always non-negative. H ( X ) = 0 iff X is deterministic. Since H b ( X ) = log b ( a ) H a ( X ), we don’t need to specify the base of the logarithm. 1.2 Joint entropy and conditional entropy Definition Joint entropy between two random variables X and Y is H ( X, Y ) , - E p ( x,y ) [log p ( X, Y )] = - X x ∈X X y ∈Y p ( x, y ) log p ( x, y ) Definition Given a random variable X , the conditional entropy of Y (average over X ) is H ( Y | X ) , - E p ( x ) [ H ( Y | X = x )] = - X x ∈X p ( x ) H ( Y | X = x ) = - E p ( x ) E p ( y | x ) [log p ( Y | X )] = - E p ( x,y ) [log p ( Y | X )] Note: H ( X | Y ) 6 = H ( Y | X ). 1.3 Chain rule Joint and conditional entropy provide a natural calculus: Theorem (Chain rule) H ( X, Y ) = H ( X ) + H ( Y | X ) Corollary H ( X, Y | Z ) = H ( X | Z ) + H ( Y | X, Z ) * Based on Cover & Thomas, Chapter 2 1
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Harvard SEAS ES250 – Information Theory 2 Relative Entropy and Mutual Information 2.1 Entropy and Mutual Information Entropy H ( X ) is the uncertainty (“self-information”) of a single random variable Conditional entropy H ( X | Y ) is the entropy of one random variable conditional upon knowledge of another. We call the reduction in uncertainty mutual information : I ( X ; Y ) = H ( X ) - H ( X | Y ) Eventually we will show that the maximum rate of transmission over a given channel p ( Y | X ), such that the error probability goes to zero, is given by the channel capacity : C = max p ( X ) I ( X ; Y ) Theorem Relationship between mutual information and entropy I ( X ; Y ) = H ( X ) - H ( X | Y ) I ( X ; Y ) = H ( Y ) - H ( Y | X ) I ( X ; Y ) = H ( X ) + H ( Y ) - H ( X, Y ) I ( X ; Y ) = I ( Y ; X ) (symmetry) I ( X ; X ) = H ( X ) (“self-information”) 2.2 Relative Entropy and Mutual Information Definition Relative entropy (Information- or Kullback-Leibler divergence) D ( p k q ) , E p
Image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern