This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: STANFORD UNIVERSITY CS 229, Autumn 2011 Midterm Examination Wednesday, November 9, 6:00pm9:00pm Question Points 1 Generalized Linear Models /15 2 Gaussian Naive Bayes /15 3 Linear Invariance of Logistic Regression /12 4 ‘ 2Regularized SVM /18 5 Uniform Convergence /16 6 Short Answers /38 Total /114 Name of Student: SUID: The Stanford University Honor Code: I attest that I have not given or received aid in this examination, and that I have done my share and taken an active part in seeing to it that others as well as myself uphold the spirit and letter of the Honor Code. Signed: CS229 Midterm 2 1. [15 points] Generalized Linear Models In class, we showed that the Bernoulli and Gaussian distributions are exponential family distributions, which are of the form p ( y ; η ) = b ( y ) exp( η T T ( y ) a ( η )) In this problem, we will consider a different exponential family distribution, specifi cally the Exponential distribution, which has a density given by p ( y ; λ ) = λ exp( λy ) Here, y ≥ 0 is a nonnegative real number, and the distribution is parameterized by λ ∈ R . (a) [5 points] Write the Exponential distribution in the exponential family form given above. You will need to come up with expressions for η , b ( y ), T ( y ), and a ( η ). Answer: p ( y ; λ ) = exp(log λ λy ) b ( y ) = 1 η = λ T ( y ) = y a ( η ) = log ( η ) Note: an equally valid solution has T ( y ) = y,η = λ , and a ( η ) = log( η ) . This will give sign flips in parts b and c, but results in an identical Hessian in part d. CS229 Midterm 3 (b) [2 points] Derive the canonical response function g ( η ), which gives the Expo nential distribution’s mean as a function of the natural parameter η . You may use the fact that an Exponential distribution (with parameter λ ) has mean 1 λ . Answer: g ( η ) = 1 λ = 1 η (c) [2 points] Assuming that we have a training set { ( x (1) ,y (1) ) ,..., ( x ( m ) ,y ( m ) ) } of m independently and identically distributed (IID) examples, write down the loglikelihood ‘ ( θ ) of the parameters. Answer: L ( θ ) = m Y i =1 p ( y ( i )  x ( i ) ; θ ) = m Y i =1 exp ( η T T ( y ( i ) ) a ( η ) ) = m Y i =1 exp ( θ T x ( i ) ) T y ( i ) + log ( θ T x ( i ) ) ‘ ( θ ) = m X i =1 θ T x ( i ) y ( i ) + log ( θ T x ( i ) ) (d) [6 points] Find the hessian H of the loglikelihood ‘ ( θ ), and show that it is negative semidefinite. CS229 Midterm 4 Answer: First, the gradient: ∂‘ ( θ ) ∂θ j = m X i =1 ∂ ∂θ j ( θ T x ( i ) y ( i ) + log ( θ T x ( i ) )) = m X i =1 x ( i ) j y ( i ) + ∂ ∂θ j log ( θ T x ( i ) ) = m X i =1 x ( i ) j y ( i ) + 1 θ T x ( i ) ∂ ∂θ j ( θ T x ( i ) ) = m X i =1 x ( i ) j y ( i ) + x ( i ) j θ T x ( i ) = m X i =1 y ( i ) + 1 θ T x ( i ) x ( i ) j Now, the Hessian: ∂‘ ( θ ) ∂θ j = m X i =1 y ( i ) + 1 θ T x ( i ) x ( i ) j ∂ 2 ∂θ j ∂θ k = m X i =1 x ( i ) j ∂ ∂θ k 1 θ T x ( i ) = m X i =1 x ( i ) j 1 ( θ T x ( i...
View
Full
Document
This document was uploaded on 01/06/2012.
 Fall '09

Click to edit the document details