This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS229 Problem Set #1 1 CS 229, Autumn 2011 Problem Set #1: Supervised Learning Due in class (9:30am) on Wednesday, October 19. Notes: (1) These questions require thought, but do not require long answers. Please be as concise as possible. (2) When sending questions to [email protected] , please make sure to write the homework number and the question number in the subject line, such as Hwk1 Q4 , and send a separate email per question. (3) If you missed the first lecture or are unfamiliar with the class’ collaboration or honor code policy, please read the policy on Handout #1 (available from the course website) before starting work. (4) For problems that require programming, please include in your submission a printout of your code (with comments) and any figures that you are asked to plot. (5) Please indicate the submission time and number of late dates clearly in your submission. SCPD students: Please email your solutions to [email protected] , and write “Prob lem Set 1 Submission” on the Subject of the email. If you are writing your solutions out by hand, please write clearly and in a reasonably large font using a dark pen to improve legibility. 1. [25 points] Logistic regression (a) [10 points] Consider the loglikelihood function for logistic regression: ℓ ( θ ) = m summationdisplay i =1 y ( i ) log h ( x ( i ) ) + (1 − y ( i ) ) log(1 − h ( x ( i ) )) Find the Hessian H of this function, and show that for any vector z , it holds true that z T Hz ≤ . [Hint: You might want to start by showing the fact that ∑ i ∑ j z i x i x j z j = ( x T z ) 2 ≥ 0.] Remark: This is one of the standard ways of showing that the matrix H is negative semidefinite, written “ H ≤ 0.” This implies that ℓ is concave, and has no local maxima other than the global one. 1 If you have some other way of showing H ≤ 0, you’re also welcome to use your method instead of the one above. (b) [10 points] On the Leland system, the files /afs/ir/class/cs229/ps/ps1/q1x.dat and /afs/ir/class/cs229/ps/ps1/q1y.dat contain the inputs ( x ( i ) ∈ R 2 ) and out puts ( y ( i ) ∈ { , 1 } ) respectively for a binary classification problem, with one training example per row. Implement 2 Newton’s method for optimizing ℓ ( θ ), and apply it to fit a logistic regression model to the data. Initialize Newton’s method with θ = vector 0 (the vector of all zeros). What are the coefficients θ resulting from your fit? (Remember to include the intercept term.) (c) [5 points] Plot the training data (your axes should be x 1 and x 2 , corresponding to the two coordinates of the inputs, and you should use a different symbol for each 1 If you haven’t seen this result before, please feel encouraged to ask us about it during office hours....
View
Full
Document
This document was uploaded on 01/06/2012.
 Fall '09

Click to edit the document details