This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: CS229 Problem Set #1 1 CS 229, Autumn 2011 Problem Set #1 Solutions: Supervised Learning Due in class (9:30am) on Wednesday, October 19. Notes: (1) These questions require thought, but do not require long answers. Please be as concise as possible. (2) When sending questions to [email protected] , please make sure to write the homework number and the question number in the subject line, such as Hwk1 Q4 , and send a separate email per question. (3) If you missed the first lecture or are unfamiliar with the class’ collaboration or honor code policy, please read the policy on Handout #1 (available from the course website) before starting work. (4) For problems that require programming, please include in your submission a printout of your code (with comments) and any figures that you are asked to plot. (5) Please indicate the submission time and number of late dates clearly in your submission. SCPD students: Please email your solutions to [email protected] , and write “Prob- lem Set 1 Submission” on the Subject of the email. If you are writing your solutions out by hand, please write clearly and in a reasonably large font using a dark pen to improve legibility. 1. [25 points] Logistic regression (a) [10 points] Consider the log-likelihood function for logistic regression: ℓ ( θ ) = m summationdisplay i =1 y ( i ) log h ( x ( i ) ) + (1 − y ( i ) ) log(1 − h ( x ( i ) )) Find the Hessian H of this function, and show that for any vector z , it holds true that z T Hz ≤ . [Hint: You might want to start by showing the fact that ∑ i ∑ j z i x i x j z j = ( x T z ) 2 ≥ 0.] Remark: This is one of the standard ways of showing that the matrix H is negative semi-definite, written “ H ≤ 0.” This implies that ℓ is concave, and has no local maxima other than the global one. 1 If you have some other way of showing H ≤ 0, you’re also welcome to use your method instead of the one above. Answer: (Note we do things in a slightly shorter way here; this solution does not use the hint.) Recall that we have g ′ ( z ) = g ( z )(1 − g ( z )) , and thus for h ( x ) = g ( θ T x ) , we have ∂h ( x ) ∂θ k = h ( x )(1 − h ( x )) x k . This latter fact is very useful to make the following derivations. Remember we have shown in class: ∂l ( θ ) ∂θ k = m summationdisplay i =1 ( y ( i ) − h ( x ( i ) )) x ( i ) k (1) 1 If you haven’t seen this result before, please feel encouraged to ask us about it during office hours. CS229 Problem Set #1 2 H kl = ∂ 2 l ( θ ) ∂θ k ∂θ l (2) = m summationdisplay i =1 − ∂h ( x ( i ) ) ∂θ l x ( i ) k (3) = m summationdisplay i =1 − h ( x ( i ) )(1 − h ( x ( i ) )) x ( i ) l x ( i ) k (4) (5) So we have for the hessian matrix H (using that for X = xx T if and only if X ij = x i x j ): H = − m summationdisplay i =1 h ( x ( i ) )(1 − h ( x ( i ) )) x ( i ) x ( i ) T (6) (7) And to prove H is negative semidefinite, we show z T Hz ≤ for all z ....
View Full Document
This document was uploaded on 01/06/2012.
- Fall '09