This preview shows pages 1–10. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CSE 6740 Lecture 20 How Do I Optimize Convex/Linear Functions? (Unconstrained Linear Optimization) Alexander Gray agray@cc.gatech.edu Georgia Institute of Technology CSE 6740 Lecture 20 p. 1/4 5 Today 1. Unconstrained Optimization: LatentVariable 2. Convex Optimization Problems 3. Unconstrained Optimization: Linear Algebraic CSE 6740 Lecture 20 p. 2/4 5 Unconstrained Optimization: LatentVariable The EM algorithm, a form of bound optimization. CSE 6740 Lecture 20 p. 3/4 5 Mixture of Gaussians Recall the mixture of Gaussians model, whose hidden variable is the class label: P ( C = k ) = k , summationdisplay k k = 1 (1) f ( X  C = k ) = N ( k , 2 k ) (2) CSE 6740 Lecture 20 p. 4/4 5 Mixture of Gaussians Recall the mixture of Gaussians model, whose hidden variable is the class label: P ( C = k ) = k , summationdisplay k k = 1 (3) f ( X  C = k ) = N ( k , 2 k ) (4) f ( X ) = K summationdisplay k =1 f ( X  C = k ) P ( C = k ) = K summationdisplay k =1 k N ( k , 2 k ) (5) CSE 6740 Lecture 20 p. 5/4 5 Mixture of Gaussians Recall Bayes rule, which gives P ( C = k  x ) = f ( x  C = k ) P ( C = k ) f ( x ) . (6) This value is the probability that a particular component k was responsible for generating the point x , and satisfies K k =1 P ( C = k  x ) = 1 . Well use as a shorthand w ik P ( C = k  x i ) . (7) CSE 6740 Lecture 20 p. 6/4 5 Mixture of Gaussians Well consider a simplified case where the covariances are fixed to be diagonal with all dimensions equal, k = 2 k I , so f ( x  C = k ) = N ( k , k ) = 1 (2 2 k ) D/ 2 exp braceleftbigg  x k  2 2 2 k bracerightbigg (8) and f ( x ) = K summationdisplay k =1 k 1 (2 2 k ) D/ 2 exp braceleftbigg  x k  2 2 2 k bracerightbigg . (9) CSE 6740 Lecture 20 p. 7/4 5 inimizing the Negative Loglikelihood It is equivalent to minimize the negative loglikelihood E log L ( ) = N summationdisplay i =1 log f ( X i ) (10) = N summationdisplay i =1 log parenleftBigg K summationdisplay k =1 f ( X i  C = k ) P ( C = k ) parenrightBigg . (11) Since this error function is a smooth differentiable function of the parameters, we can employ its derivatives to perform unconstrained optimization on it. CSE 6740 Lecture 20 p. 8/4 5 EM: Recurrence Idea Now lets revisit the EM algorithm, to how to derive it from a more fundamental principle, that of bound optimization . We can write the change in error on each iteration in the form E new E old = summationdisplay i log parenleftbigg f new ( x i ) f old ( x i ) parenrightbigg (12) = summationdisplay i log parenleftbigg k f new ( x i  C = k ) P new ( C = k ) f old ( x i ) P old ( C = k  x i ) P old ( C = k  x i ) parenrightbigg (13) where the last factor is simply the identity....
View
Full
Document
This note was uploaded on 04/03/2010 for the course CSE 6740 taught by Professor Staff during the Fall '08 term at Georgia Institute of Technology.
 Fall '08
 Staff

Click to edit the document details