ps4_solution

ps4_solution - CS229 Problem Set#4 Solutions 1 CS 229...

This preview shows pages 1–3. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS229 Problem Set #4 Solutions 1 CS 229, Public Course Problem Set #4 Solutions: Unsupervised Learn- ing and Reinforcement Learning 1. EM for supervised learning In class we applied EM to the unsupervised learning setting. In particular, we represented p ( x ) by marginalizing over a latent random variable p ( x ) = summationdisplay z p ( x,z ) = summationdisplay z p ( x | z ) p ( z ) . However, EM can also be applied to the supervised learning setting, and in this problem we discuss a “mixture of linear regressors” model; this is an instance of what is often call the Hierarchical Mixture of Experts model. We want to represent p ( y | x ), x ∈ R n and y ∈ R , and we do so by again introducing a discrete latent random variable p ( y | x ) = summationdisplay z p ( y,z | x ) = summationdisplay z p ( y | x,z ) p ( z | x ) . For simplicity we’ll assume that z is binary valued, that p ( y | x,z ) is a Gaussian density, and that p ( z | x ) is given by a logistic regression model. More formally p ( z | x ; φ ) = g ( φ T x ) z (1 − g ( φ T x )) 1 − z p ( y | x,z = i ; θ i ) = 1 √ 2 πσ exp parenleftbigg − ( y − θ T i x ) 2 2 σ 2 parenrightbigg i = 1 , 2 where σ is a known parameter and φ,θ ,θ 1 ∈ R n are parameters of the model (here we use the subscript on θ to denote two different parameter vectors, not to index a particular entry in these vectors). Intuitively, the process behind model can be thought of as follows. Given a data point x , we first determine whether the data point belongs to one of two hidden classes z = 0 or z = 1, using a logistic regression model. We then determine y as a linear function of x (different linear functions for different values of z ) plus Gaussian noise, as in the standard linear regression model. For example, the following data set could be well-represented by the model, but not by standard linear regression. CS229 Problem Set #4 Solutions 2 (a) Suppose x , y , and z are all observed, so that we obtain a training set { ( x (1) ,y (1) ,z (1) ) ,..., ( x ( m ) ,y ( m ) ,z ( m ) ) } . Write the log-likelihood of the parameters, and derive the maximum likelihood estimates for φ , θ , and θ 1 . Note that because p ( z | x ) is a logistic regression model, there will not exist a closed form estimate of φ . In this case, derive the gradient and the Hessian of the likelihood with respect to φ ; in practice, these quantities can be used to numerically compute the ML esimtate. Answer: The log-likelihood is given by ℓ ( φ,θ ,θ 1) = log m productdisplay i =1 p ( y ( i ) | x ( i ) ,z ( i ) ; θ ,θ 1 ) p ( z ( i ) | x ( i ) ; φ ) = summationdisplay i : z ( i ) =0 log parenleftbigg (1 − g ( φ T x )) 1 √ 2 πσ exp parenleftbigg − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 parenrightbiggparenrightbigg + summationdisplay i : z ( i ) =1 log parenleftbigg ( g ( φ T x ) 1 √ 2 πσ exp parenleftbigg − ( y ( i ) − θ T 1 x ( i ) ) 2 2 σ 2 parenrightbiggparenrightbigg Differentiating with respect to...
View Full Document

{[ snackBarMessage ]}

Page1 / 12

ps4_solution - CS229 Problem Set#4 Solutions 1 CS 229...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online