*This preview shows
pages
1–3. Sign up
to
view the full content.*

This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **CS229 Practice Midterm Solutions 1 CS 229, Autumn 2009 Practice Midterm Solutions Notes: 1. The midterm will have about 5-6 long questions, and about 8-10 short questions. Space will be provided on the actual midterm for you to write your answers. 2. The midterm is meant to be educational, and as such some questions could be quite challenging. Use your time wisely to answer as much as you can! 3. For additional practice, please see CS 229 extra problem sets available at http://see.stanford.edu/see/materials/aimlcs229/assignments.aspx 1. [13 points] Generalized Linear Models Recall that generalized linear models assume that the response variable y (conditioned on x ) is distributed according to a member of the exponential family: p ( y ; ) = b ( y )exp( T ( y )- a ( )) , where = T x . For this problem, we will assume R . (a) [10 points] Given a training set { ( x ( i ) ,y ( i ) ) } m i =1 , the loglikelihood is given by ( ) = m X i =1 log p ( y ( i ) | x ( i ) ; ) . Give a set of conditions on b ( y ), T ( y ), and a ( ) which ensure that the loglikelihood is a concave function of (and thus has a unique maximum). Your conditions must be reasonable, and should be as weak as possible. (E.g., the answer any b ( y ), T ( y ), and a ( ) so that ( ) is concave is not reasonable. Similarly, overly narrow conditions, including ones that apply only to specific GLMs, are also not reasonable.) Answer: The log-likelihood is given by ( ) = M X k =1 log( b ( y )) + ( k ) T ( y )- a ( ( k ) ) where ( k ) = T x ( k ) . Find the Hessian by taking the partials with respect to i and j , i ( ) = M X k =1 T ( y ) x ( k ) i- a ( ( k ) ) x ( k ) i 2 i j ( ) = M X k =1- 2 2 a ( ( k ) ) x ( k ) i x ( k ) j CS229 Practice Midterm Solutions 2 = H i,j H =- M X k =1 2 2 a ( ( k ) ) x ( k ) x ( k ) T z T Hz =- M X k =1 2 2 a ( ( k ) )( z T x ( k ) ) 2 If 2 2 a ( ) for all , then z T Hz . If H is negative semi-definite, then the original optimization problem is concave. (b) [3 points] When the response variable is distributed according to a Normal distribu- tion (with unit variance), we have b ( y ) = 1 2 e- y 2 2 , T ( y ) = y , and a ( ) = 2 2 . Verify that the condition(s) you gave in part (a) hold for this setting. Answer: 2 2 a ( ) = 1 . 2. [15 points] Bayesian linear regression Consider Bayesian linear regression using a Gaussian prior on the parameters R n +1 . Thus, in our prior, N ( ~ , 2 I n +1 ), where 2 R , and I n +1 is the n +1-by- n +1 identity matrix. Also let the conditional distribution of y ( i ) given x ( i ) and be N ( T x ( i ) , 2 ), as in our usual linear least-squares model....

View
Full
Document