{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

problemset4

# problemset4 - CS229 Problem Set#4 1 CS 229 Public Course...

This preview shows pages 1–3. Sign up to view the full content.

CS229 Problem Set #4 1 CS 229, Public Course Problem Set #4: Unsupervised Learning and Re- inforcement Learning 1. EM for supervised learning In class we applied EM to the unsupervised learning setting. In particular, we represented p ( x ) by marginalizing over a latent random variable p ( x ) = summationdisplay z p ( x, z ) = summationdisplay z p ( x | z ) p ( z ) . However, EM can also be applied to the supervised learning setting, and in this problem we discuss a “mixture of linear regressors” model; this is an instance of what is often call the Hierarchical Mixture of Experts model. We want to represent p ( y | x ), x R n and y R , and we do so by again introducing a discrete latent random variable p ( y | x ) = summationdisplay z p ( y, z | x ) = summationdisplay z p ( y | x, z ) p ( z | x ) . For simplicity we’ll assume that z is binary valued, that p ( y | x, z ) is a Gaussian density, and that p ( z | x ) is given by a logistic regression model. More formally p ( z | x ; φ ) = g ( φ T x ) z (1 g ( φ T x )) 1 z p ( y | x, z = i ; θ i ) = 1 2 πσ exp parenleftbigg ( y θ T i x ) 2 2 σ 2 parenrightbigg i = 1 , 2 where σ is a known parameter and φ, θ 0 , θ 1 R n are parameters of the model (here we use the subscript on θ to denote two different parameter vectors, not to index a particular entry in these vectors). Intuitively, the process behind model can be thought of as follows. Given a data point x , we first determine whether the data point belongs to one of two hidden classes z = 0 or z = 1, using a logistic regression model. We then determine y as a linear function of x (different linear functions for different values of z ) plus Gaussian noise, as in the standard linear regression model. For example, the following data set could be well-represented by the model, but not by standard linear regression.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
CS229 Problem Set #4 2 (a) Suppose x , y , and z are all observed, so that we obtain a training set { ( x (1) , y (1) , z (1) ) , . . . , ( x ( m ) , y ( m ) , z ( m ) ) } . Write the log-likelihood of the parameters, and derive the maximum likelihood estimates for φ , θ 0 , and θ 1 . Note that because p ( z | x ) is a logistic regression model, there will not exist a closed form estimate of φ . In this case, derive the gradient and the Hessian of the likelihood with respect to φ ; in practice, these quantities can be used to numerically compute the ML esimtate. (b) Now suppose z is a latent (unobserved) random variable. Write the log-likelihood of the parameters, and derive an EM algorithm to maximize the log-likelihood. Clearly specify the E-step and M-step (again, the M-step will require a numerical solution, so find the appropriate gradients and Hessians). 2. Factor Analysis and PCA In this problem we look at the relationship between two unsupervised learning algorithms we discussed in class: Factor Analysis and Principle Component Analysis. Consider the following joint distribution over ( x, z ) where z R k is a latent random variable z N (0 , I ) x | z N ( Uz, σ 2 I ) .
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 5

problemset4 - CS229 Problem Set#4 1 CS 229 Public Course...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online