*This preview shows
pages
1–3. Sign up
to
view the full content.*

This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **CS229 Problem Set #4 Solutions 1 CS 229, Public Course Problem Set #4 Solutions: Unsupervised Learn- ing and Reinforcement Learning 1. EM for supervised learning In class we applied EM to the unsupervised learning setting. In particular, we represented p ( x ) by marginalizing over a latent random variable p ( x ) = summationdisplay z p ( x,z ) = summationdisplay z p ( x | z ) p ( z ) . However, EM can also be applied to the supervised learning setting, and in this problem we discuss a mixture of linear regressors model; this is an instance of what is often call the Hierarchical Mixture of Experts model. We want to represent p ( y | x ), x R n and y R , and we do so by again introducing a discrete latent random variable p ( y | x ) = summationdisplay z p ( y,z | x ) = summationdisplay z p ( y | x,z ) p ( z | x ) . For simplicity well assume that z is binary valued, that p ( y | x,z ) is a Gaussian density, and that p ( z | x ) is given by a logistic regression model. More formally p ( z | x ; ) = g ( T x ) z (1 g ( T x )) 1 z p ( y | x,z = i ; i ) = 1 2 exp parenleftbigg ( y T i x ) 2 2 2 parenrightbigg i = 1 , 2 where is a known parameter and , , 1 R n are parameters of the model (here we use the subscript on to denote two different parameter vectors, not to index a particular entry in these vectors). Intuitively, the process behind model can be thought of as follows. Given a data point x , we first determine whether the data point belongs to one of two hidden classes z = 0 or z = 1, using a logistic regression model. We then determine y as a linear function of x (different linear functions for different values of z ) plus Gaussian noise, as in the standard linear regression model. For example, the following data set could be well-represented by the model, but not by standard linear regression. CS229 Problem Set #4 Solutions 2 (a) Suppose x , y , and z are all observed, so that we obtain a training set { ( x (1) ,y (1) ,z (1) ) ,..., ( x ( m ) ,y ( m ) ,z ( m ) ) } . Write the log-likelihood of the parameters, and derive the maximum likelihood estimates for , , and 1 . Note that because p ( z | x ) is a logistic regression model, there will not exist a closed form estimate of . In this case, derive the gradient and the Hessian of the likelihood with respect to ; in practice, these quantities can be used to numerically compute the ML esimtate. Answer: The log-likelihood is given by ( , , 1) = log m productdisplay i =1 p ( y ( i ) | x ( i ) ,z ( i ) ; , 1 ) p ( z ( i ) | x ( i ) ; ) = summationdisplay i : z ( i ) =0 log parenleftbigg (1 g ( T x )) 1 2 exp parenleftbigg ( y ( i ) T x ( i ) ) 2 2 2 parenrightbiggparenrightbigg + summationdisplay i : z ( i ) =1 log parenleftbigg ( g ( T x ) 1 2 exp parenleftbigg ( y ( i ) T 1 x ( i ) ) 2 2 2 parenrightbiggparenrightbigg Differentiating with respect to...

View
Full
Document