ps1-sol - CS229 Problem Set #1 1 CS 229, Autumn 2011...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS229 Problem Set #1 1 CS 229, Autumn 2011 Problem Set #1 Solutions: Supervised Learning Due in class (9:30am) on Wednesday, October 19. Notes: (1) These questions require thought, but do not require long answers. Please be as concise as possible. (2) When sending questions to cs229-qa@cs.stanford.edu , please make sure to write the homework number and the question number in the subject line, such as Hwk1 Q4 , and send a separate email per question. (3) If you missed the first lecture or are unfamiliar with the class collaboration or honor code policy, please read the policy on Handout #1 (available from the course website) before starting work. (4) For problems that require programming, please include in your submission a printout of your code (with comments) and any figures that you are asked to plot. (5) Please indicate the submission time and number of late dates clearly in your submission. SCPD students: Please email your solutions to cs229-qa@cs.stanford.edu , and write Prob- lem Set 1 Submission on the Subject of the email. If you are writing your solutions out by hand, please write clearly and in a reasonably large font using a dark pen to improve legibility. 1. [25 points] Logistic regression (a) [10 points] Consider the log-likelihood function for logistic regression: ( ) = m summationdisplay i =1 y ( i ) log h ( x ( i ) ) + (1 y ( i ) ) log(1 h ( x ( i ) )) Find the Hessian H of this function, and show that for any vector z , it holds true that z T Hz . [Hint: You might want to start by showing the fact that i j z i x i x j z j = ( x T z ) 2 0.] Remark: This is one of the standard ways of showing that the matrix H is negative semi-definite, written H 0. This implies that is concave, and has no local maxima other than the global one. 1 If you have some other way of showing H 0, youre also welcome to use your method instead of the one above. Answer: (Note we do things in a slightly shorter way here; this solution does not use the hint.) Recall that we have g ( z ) = g ( z )(1 g ( z )) , and thus for h ( x ) = g ( T x ) , we have h ( x ) k = h ( x )(1 h ( x )) x k . This latter fact is very useful to make the following derivations. Remember we have shown in class: l ( ) k = m summationdisplay i =1 ( y ( i ) h ( x ( i ) )) x ( i ) k (1) 1 If you havent seen this result before, please feel encouraged to ask us about it during office hours. CS229 Problem Set #1 2 H kl = 2 l ( ) k l (2) = m summationdisplay i =1 h ( x ( i ) ) l x ( i ) k (3) = m summationdisplay i =1 h ( x ( i ) )(1 h ( x ( i ) )) x ( i ) l x ( i ) k (4) (5) So we have for the hessian matrix H (using that for X = xx T if and only if X ij = x i x j ): H = m summationdisplay i =1 h ( x ( i ) )(1 h ( x ( i ) )) x ( i ) x ( i ) T (6) (7) And to prove H is negative semidefinite, we show z T Hz for all z ....
View Full Document

Page1 / 16

ps1-sol - CS229 Problem Set #1 1 CS 229, Autumn 2011...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online