{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lecture20 - CSE 6740 Lecture 20 How Do I Optimize...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CSE 6740 Lecture 20 How Do I Optimize Convex/Linear Functions? (Unconstrained Linear Optimization) Alexander Gray [email protected] Georgia Institute of Technology CSE 6740 Lecture 20 – p. 1/4 5 Today 1. Unconstrained Optimization: Latent-Variable 2. Convex Optimization Problems 3. Unconstrained Optimization: Linear Algebraic CSE 6740 Lecture 20 – p. 2/4 5 Unconstrained Optimization: Latent-Variable The EM algorithm, a form of bound optimization. CSE 6740 Lecture 20 – p. 3/4 5 Mixture of Gaussians Recall the mixture of Gaussians model, whose “hidden” variable is the class label: P ( C = k ) = π k , summationdisplay k π k = 1 (1) f ( X | C = k ) = N ( μ k , Σ 2 k ) (2) CSE 6740 Lecture 20 – p. 4/4 5 Mixture of Gaussians Recall the mixture of Gaussians model, whose “hidden” variable is the class label: P ( C = k ) = π k , summationdisplay k π k = 1 (3) f ( X | C = k ) = N ( μ k , Σ 2 k ) (4) f ( X ) = K summationdisplay k =1 f ( X | C = k ) P ( C = k ) = K summationdisplay k =1 π k N ( μ k , Σ 2 k ) (5) CSE 6740 Lecture 20 – p. 5/4 5 Mixture of Gaussians Recall Bayes rule, which gives P ( C = k | x ) = f ( x | C = k ) P ( C = k ) f ( x ) . (6) This value is the probability that a particular component k was responsible for generating the point x , and satisfies ∑ K k =1 P ( C = k | x ) = 1 . We’ll use as a shorthand w ik ≡ P ( C = k | x i ) . (7) CSE 6740 Lecture 20 – p. 6/4 5 Mixture of Gaussians We’ll consider a simplified case where the covariances are fixed to be diagonal with all dimensions equal, Σ k = σ 2 k I , so f ( x | C = k ) = N ( μ k , Σ k ) = 1 (2 πσ 2 k ) D/ 2 exp braceleftbigg − || x − μ k || 2 2 σ 2 k bracerightbigg (8) and f ( x ) = K summationdisplay k =1 π k 1 (2 πσ 2 k ) D/ 2 exp braceleftbigg − || x − μ k || 2 2 σ 2 k bracerightbigg . (9) CSE 6740 Lecture 20 – p. 7/4 5 inimizing the Negative Log-likelihood It is equivalent to minimize the negative log-likelihood E ≡ − log L ( θ ) = − N summationdisplay i =1 log f θ ( X i ) (10) = − N summationdisplay i =1 log parenleftBigg K summationdisplay k =1 f ( X i | C = k ) P ( C = k ) parenrightBigg . (11) Since this error function is a smooth differentiable function of the parameters, we can employ its derivatives to perform unconstrained optimization on it. CSE 6740 Lecture 20 – p. 8/4 5 EM: Recurrence Idea Now let’s revisit the EM algorithm, to how to derive it from a more fundamental principle, that of bound optimization . We can write the change in error on each iteration in the form E new − E old = − summationdisplay i log parenleftbigg f new ( x i ) f old ( x i ) parenrightbigg (12) = − summationdisplay i log parenleftbigg∑ k f new ( x i | C = k ) P new ( C = k ) f old ( x i ) P old ( C = k | x i ) P old ( C = k | x i ) parenrightbigg (13) where the last factor is simply the identity....
View Full Document

{[ snackBarMessage ]}

Page1 / 45

lecture20 - CSE 6740 Lecture 20 How Do I Optimize...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon bookmark
Ask a homework question - tutors are online