The main results of this section are two theorems the

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: (y) log p(y ) p(y ) The KL divergence is only defined if q (y ) > 0 for any y such that p(y ) > 0. It is sometimes referred to as the KL distance, however it is not a metric in the mathematical sense because in general it is asymmetric: D(q (·)||p(·)) = D(p(·)||q (·)). It is the average of the logarithmic difference between the prob­ ability distributions p(y ) and q (y ), where the average is taken with respect to q (y ). The following property of the KL divergence will be very important for us. Theorem 2 (Non-negativity of KL Divergence). D(q (·)||p(·)) ≥ 0 with equal­ ity if and only if q (y ) = p(y ) ∀y . Proof. We will rely on Jensen’s inequality, which states that for any convex function f and random variable X , 1[f (X )] ≥ f (1[X ]). When f is strictly convex, Jensen’s inequality holds with equality if and only if X is constant, so that 1[X ] = X and 1[f (X )] = f (X ). Take y ∼ q (y ) and (y ) define the random variable X = p(y) . Let f (X ) = − log(X ), a strictly convex q function. Now we can...
View Full Document

This note was uploaded on 03/24/2014 for the course MIT 15.097 taught by Professor Cynthiarudin during the Spring '12 term at MIT.

Ask a homework question - tutors are online