problemset4

problemset4 - CS229 Problem Set #4 1 CS 229, Public Course...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS229 Problem Set #4 1 CS 229, Public Course Problem Set #4: Unsupervised Learning and Re- inforcement Learning 1. EM for supervised learning In class we applied EM to the unsupervised learning setting. In particular, we represented p ( x ) by marginalizing over a latent random variable p ( x ) = summationdisplay z p ( x,z ) = summationdisplay z p ( x | z ) p ( z ) . However, EM can also be applied to the supervised learning setting, and in this problem we discuss a mixture of linear regressors model; this is an instance of what is often call the Hierarchical Mixture of Experts model. We want to represent p ( y | x ), x R n and y R , and we do so by again introducing a discrete latent random variable p ( y | x ) = summationdisplay z p ( y,z | x ) = summationdisplay z p ( y | x,z ) p ( z | x ) . For simplicity well assume that z is binary valued, that p ( y | x,z ) is a Gaussian density, and that p ( z | x ) is given by a logistic regression model. More formally p ( z | x ; ) = g ( T x ) z (1 g ( T x )) 1 z p ( y | x,z = i ; i ) = 1 2 exp parenleftbigg ( y T i x ) 2 2 2 parenrightbigg i = 1 , 2 where is a known parameter and , , 1 R n are parameters of the model (here we use the subscript on to denote two different parameter vectors, not to index a particular entry in these vectors). Intuitively, the process behind model can be thought of as follows. Given a data point x , we first determine whether the data point belongs to one of two hidden classes z = 0 or z = 1, using a logistic regression model. We then determine y as a linear function of x (different linear functions for different values of z ) plus Gaussian noise, as in the standard linear regression model. For example, the following data set could be well-represented by the model, but not by standard linear regression. CS229 Problem Set #4 2 (a) Suppose x , y , and z are all observed, so that we obtain a training set { ( x (1) ,y (1) ,z (1) ) ,..., ( x ( m ) ,y ( m ) ,z ( m ) ) } . Write the log-likelihood of the parameters, and derive the maximum likelihood estimates for , , and 1 . Note that because p ( z | x ) is a logistic regression model, there will not exist a closed form estimate of . In this case, derive the gradient and the Hessian of the likelihood with respect to ; in practice, these quantities can be used to numerically compute the ML esimtate. (b) Now suppose z is a latent (unobserved) random variable. Write the log-likelihood of the parameters, and derive an EM algorithm to maximize the log-likelihood. Clearly specify the E-step and M-step (again, the M-step will require a numerical solution, so find the appropriate gradients and Hessians)....
View Full Document

This note was uploaded on 01/24/2010 for the course CS 229 at Stanford.

Page1 / 5

problemset4 - CS229 Problem Set #4 1 CS 229, Public Course...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online