This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CSE 6740 Lecture 20 How Do I Optimize Convex/Linear Functions? (Unconstrained Linear Optimization) Alexander Gray [email protected] Georgia Institute of Technology CSE 6740 Lecture 20 – p. 1/4 5 Today 1. Unconstrained Optimization: LatentVariable 2. Convex Optimization Problems 3. Unconstrained Optimization: Linear Algebraic CSE 6740 Lecture 20 – p. 2/4 5 Unconstrained Optimization: LatentVariable The EM algorithm, a form of bound optimization. CSE 6740 Lecture 20 – p. 3/4 5 Mixture of Gaussians Recall the mixture of Gaussians model, whose “hidden” variable is the class label: P ( C = k ) = π k , summationdisplay k π k = 1 (1) f ( X  C = k ) = N ( μ k , Σ 2 k ) (2) CSE 6740 Lecture 20 – p. 4/4 5 Mixture of Gaussians Recall the mixture of Gaussians model, whose “hidden” variable is the class label: P ( C = k ) = π k , summationdisplay k π k = 1 (3) f ( X  C = k ) = N ( μ k , Σ 2 k ) (4) f ( X ) = K summationdisplay k =1 f ( X  C = k ) P ( C = k ) = K summationdisplay k =1 π k N ( μ k , Σ 2 k ) (5) CSE 6740 Lecture 20 – p. 5/4 5 Mixture of Gaussians Recall Bayes rule, which gives P ( C = k  x ) = f ( x  C = k ) P ( C = k ) f ( x ) . (6) This value is the probability that a particular component k was responsible for generating the point x , and satisfies ∑ K k =1 P ( C = k  x ) = 1 . We’ll use as a shorthand w ik ≡ P ( C = k  x i ) . (7) CSE 6740 Lecture 20 – p. 6/4 5 Mixture of Gaussians We’ll consider a simplified case where the covariances are fixed to be diagonal with all dimensions equal, Σ k = σ 2 k I , so f ( x  C = k ) = N ( μ k , Σ k ) = 1 (2 πσ 2 k ) D/ 2 exp braceleftbigg −  x − μ k  2 2 σ 2 k bracerightbigg (8) and f ( x ) = K summationdisplay k =1 π k 1 (2 πσ 2 k ) D/ 2 exp braceleftbigg −  x − μ k  2 2 σ 2 k bracerightbigg . (9) CSE 6740 Lecture 20 – p. 7/4 5 inimizing the Negative Loglikelihood It is equivalent to minimize the negative loglikelihood E ≡ − log L ( θ ) = − N summationdisplay i =1 log f θ ( X i ) (10) = − N summationdisplay i =1 log parenleftBigg K summationdisplay k =1 f ( X i  C = k ) P ( C = k ) parenrightBigg . (11) Since this error function is a smooth differentiable function of the parameters, we can employ its derivatives to perform unconstrained optimization on it. CSE 6740 Lecture 20 – p. 8/4 5 EM: Recurrence Idea Now let’s revisit the EM algorithm, to how to derive it from a more fundamental principle, that of bound optimization . We can write the change in error on each iteration in the form E new − E old = − summationdisplay i log parenleftbigg f new ( x i ) f old ( x i ) parenrightbigg (12) = − summationdisplay i log parenleftbigg∑ k f new ( x i  C = k ) P new ( C = k ) f old ( x i ) P old ( C = k  x i ) P old ( C = k  x i ) parenrightbigg (13) where the last factor is simply the identity....
View
Full Document
 Fall '08
 Staff
 Optimization, Gaussians, Unconstrained Linear Optimization

Click to edit the document details