This preview shows page 1. Sign up to view the full content.
Unformatted text preview: (y) log
The KL divergence is only deﬁned if q (y ) > 0 for any y such that p(y ) > 0.
It is sometimes referred to as the KL distance, however it is not a metric in
the mathematical sense because in general it is asymmetric: D(q (·)||p(·)) =
D(p(·)||q (·)). It is the average of the logarithmic diﬀerence between the prob
ability distributions p(y ) and q (y ), where the average is taken with respect
to q (y ). The following property of the KL divergence will be very important for us. Theorem 2 (Non-negativity of KL Divergence). D(q (·)||p(·)) ≥ 0 with equal
ity if and only if q (y ) = p(y ) ∀y . Proof. We will rely on Jensen’s inequality, which states that for any convex function f and random variable X , 1[f (X )] ≥ f (1[X ]).
When f is strictly convex, Jensen’s inequality holds with equality if and only
if X is constant, so that 1[X ] = X and 1[f (X )] = f (X ). Take y ∼ q (y ) and
deﬁne the random variable X = p(y) . Let f (X ) = − log(X ), a strictly convex
function. Now we can...
View Full Document
This note was uploaded on 03/24/2014 for the course MIT 15.097 taught by Professor Cynthiarudin during the Spring '12 term at MIT.
- Spring '12