MIT15_097S12_lec15

# The main results of this section are two theorems the

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: (y) log p(y ) p(y ) The KL divergence is only deﬁned if q (y ) > 0 for any y such that p(y ) > 0. It is sometimes referred to as the KL distance, however it is not a metric in the mathematical sense because in general it is asymmetric: D(q (·)||p(·)) = D(p(·)||q (·)). It is the average of the logarithmic diﬀerence between the prob­ ability distributions p(y ) and q (y ), where the average is taken with respect to q (y ). The following property of the KL divergence will be very important for us. Theorem 2 (Non-negativity of KL Divergence). D(q (·)||p(·)) ≥ 0 with equal­ ity if and only if q (y ) = p(y ) ∀y . Proof. We will rely on Jensen’s inequality, which states that for any convex function f and random variable X , 1[f (X )] ≥ f (1[X ]). When f is strictly convex, Jensen’s inequality holds with equality if and only if X is constant, so that 1[X ] = X and 1[f (X )] = f (X ). Take y ∼ q (y ) and (y ) deﬁne the random variable X = p(y) . Let f (X ) = − log(X ), a strictly convex q function. Now we can...
View Full Document

## This note was uploaded on 03/24/2014 for the course MIT 15.097 taught by Professor Cynthiarudin during the Spring '12 term at MIT.

Ask a homework question - tutors are online