lecture20 - CSE 6740 Lecture 20 How Do I Optimize...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CSE 6740 Lecture 20 How Do I Optimize Convex/Linear Functions? (Unconstrained Linear Optimization) Alexander Gray agray@cc.gatech.edu Georgia Institute of Technology CSE 6740 Lecture 20 p. 1/4 5 Today 1. Unconstrained Optimization: Latent-Variable 2. Convex Optimization Problems 3. Unconstrained Optimization: Linear Algebraic CSE 6740 Lecture 20 p. 2/4 5 Unconstrained Optimization: Latent-Variable The EM algorithm, a form of bound optimization. CSE 6740 Lecture 20 p. 3/4 5 Mixture of Gaussians Recall the mixture of Gaussians model, whose hidden variable is the class label: P ( C = k ) = k , summationdisplay k k = 1 (1) f ( X | C = k ) = N ( k , 2 k ) (2) CSE 6740 Lecture 20 p. 4/4 5 Mixture of Gaussians Recall the mixture of Gaussians model, whose hidden variable is the class label: P ( C = k ) = k , summationdisplay k k = 1 (3) f ( X | C = k ) = N ( k , 2 k ) (4) f ( X ) = K summationdisplay k =1 f ( X | C = k ) P ( C = k ) = K summationdisplay k =1 k N ( k , 2 k ) (5) CSE 6740 Lecture 20 p. 5/4 5 Mixture of Gaussians Recall Bayes rule, which gives P ( C = k | x ) = f ( x | C = k ) P ( C = k ) f ( x ) . (6) This value is the probability that a particular component k was responsible for generating the point x , and satisfies K k =1 P ( C = k | x ) = 1 . Well use as a shorthand w ik P ( C = k | x i ) . (7) CSE 6740 Lecture 20 p. 6/4 5 Mixture of Gaussians Well consider a simplified case where the covariances are fixed to be diagonal with all dimensions equal, k = 2 k I , so f ( x | C = k ) = N ( k , k ) = 1 (2 2 k ) D/ 2 exp braceleftbigg || x k || 2 2 2 k bracerightbigg (8) and f ( x ) = K summationdisplay k =1 k 1 (2 2 k ) D/ 2 exp braceleftbigg || x k || 2 2 2 k bracerightbigg . (9) CSE 6740 Lecture 20 p. 7/4 5 inimizing the Negative Log-likelihood It is equivalent to minimize the negative log-likelihood E log L ( ) = N summationdisplay i =1 log f ( X i ) (10) = N summationdisplay i =1 log parenleftBigg K summationdisplay k =1 f ( X i | C = k ) P ( C = k ) parenrightBigg . (11) Since this error function is a smooth differentiable function of the parameters, we can employ its derivatives to perform unconstrained optimization on it. CSE 6740 Lecture 20 p. 8/4 5 EM: Recurrence Idea Now lets revisit the EM algorithm, to how to derive it from a more fundamental principle, that of bound optimization . We can write the change in error on each iteration in the form E new E old = summationdisplay i log parenleftbigg f new ( x i ) f old ( x i ) parenrightbigg (12) = summationdisplay i log parenleftbigg k f new ( x i | C = k ) P new ( C = k ) f old ( x i ) P old ( C = k | x i ) P old ( C = k | x i ) parenrightbigg (13) where the last factor is simply the identity....
View Full Document

This note was uploaded on 04/03/2010 for the course CSE 6740 taught by Professor Staff during the Fall '08 term at Georgia Institute of Technology.

Page1 / 45

lecture20 - CSE 6740 Lecture 20 How Do I Optimize...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online