This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS221 Problem Set #2 1 CS 221 Problem Set #2: Machine Learning Due by 9:30am on Tuesday, October 27. Please see the course information page on the class website for late homework submission instructions. SCPD students can also fax their solutions to (650) 7251449. We will not accept solutions by email or courier. NOTE: These questions require thought, but do not require long answers. Please try to be as concise as possible. Written part (65 points) 1. [20 points] Least Squares In lecture, we discussed a method for least squares linear regression using gradient descent applied to a cost function J ( ). It turns out it is actually possible to solve for the linear regression parameters analytically. In this problem, you will derive analytic solutions for some important special cases. (a) [2 points] First consider the simplest case, where we have no features, and we simply try to approximate the target variable y with a constant function h = . Find the closed form solution for R which minimizes the leastsquares cost function: J ( ) = 1 2 m summationdisplay i =1 ( y ( i ) ) 2 (b) [7 points] Now consider the case of n = 1, so that there is only a single feature and each x ( i ) R is a real number. (For example, trying to predict housing prices from a single feature, the size of the house.) Find the closed form solutions for and 1 which minimize the leastsquares cost function: J ( , 1 ) = 1 2 m summationdisplay i =1 ( y ( i )  1 x ( i ) ) 2 . Express your answer in terms of the empirical means x and y and the empirical CS221 Problem Set #2 2 moments S xx , S xy , S yy , defined as: x = 1 m m summationdisplay i =1 x ( i ) y = 1 m m summationdisplay i =1 y ( i ) S xx = 1 m m summationdisplay i =1 ( x ( i ) ) 2 S xy = 1 m m summationdisplay i =1 x ( i ) y ( i ) S yy = 1 m m summationdisplay i =1 ( y ( i ) ) 2 Hint: Take the partial derivatives of J with respect to and 1 , and set them to zero. (c) [11 points] Weve seen in class that attempting to fit functions with many parameters (e.g., a highorder polynomial) using too little data can result in a overfitting. One solution to this problem is to use a simpler hypothesis with fewer parameters. In this problem, we will look a different solution to the problem of overfitting, called regularization. i. [3 points] Suppose we are fitting a hypothesis h ( x ) with parameters R n +1 to a training set { ( x ( i ) , y ( i ) ); i = 1 , . . . , m } as usual. Consider the cost function: J ( ) = 1 2 m summationdisplay i =1 ( h ( x ( i ) ) y ( i ) ) 2 + 2   2 2 where   2 is defined as   2 = radicaltp radicalvertex radicalvertex radicalbt n summationdisplay j =0 2 j Note that this is the same cost function used in class, but with the addition of the regularization term 2   2 2 at the end. This term serves to keep the norm of the parameters small, and minimizing J ( ) now involves making a tradeoff...
View
Full
Document
This note was uploaded on 12/15/2009 for the course CS 221 taught by Professor Koller,ng during the Fall '09 term at Stanford.
 Fall '09
 KOLLER,NG
 Machine Learning

Click to edit the document details