cs221-ps2

cs221-ps2 - CS221 Problem Set #2 1 CS 221 Problem Set #2:...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS221 Problem Set #2 1 CS 221 Problem Set #2: Machine Learning Due by 9:30am on Tuesday, October 27. Please see the course information page on the class website for late homework submission instructions. SCPD students can also fax their solutions to (650) 725-1449. We will not accept solutions by email or courier. NOTE: These questions require thought, but do not require long answers. Please try to be as concise as possible. Written part (65 points) 1. [20 points] Least Squares In lecture, we discussed a method for least squares linear regression using gradient descent applied to a cost function J ( ). It turns out it is actually possible to solve for the linear regression parameters analytically. In this problem, you will derive analytic solutions for some important special cases. (a) [2 points] First consider the simplest case, where we have no features, and we simply try to approximate the target variable y with a constant function h = . Find the closed form solution for R which minimizes the least-squares cost function: J ( ) = 1 2 m summationdisplay i =1 ( y ( i )- ) 2 (b) [7 points] Now consider the case of n = 1, so that there is only a single feature and each x ( i ) R is a real number. (For example, trying to predict housing prices from a single feature, the size of the house.) Find the closed form solutions for and 1 which minimize the least-squares cost function: J ( , 1 ) = 1 2 m summationdisplay i =1 ( y ( i )- - 1 x ( i ) ) 2 . Express your answer in terms of the empirical means x and y and the empirical CS221 Problem Set #2 2 moments S xx , S xy , S yy , defined as: x = 1 m m summationdisplay i =1 x ( i ) y = 1 m m summationdisplay i =1 y ( i ) S xx = 1 m m summationdisplay i =1 ( x ( i ) ) 2 S xy = 1 m m summationdisplay i =1 x ( i ) y ( i ) S yy = 1 m m summationdisplay i =1 ( y ( i ) ) 2 Hint: Take the partial derivatives of J with respect to and 1 , and set them to zero. (c) [11 points] Weve seen in class that attempting to fit functions with many parameters (e.g., a high-order polynomial) using too little data can result in a overfitting. One solution to this problem is to use a simpler hypothesis with fewer parameters. In this problem, we will look a different solution to the problem of overfitting, called regularization. i. [3 points] Suppose we are fitting a hypothesis h ( x ) with parameters R n +1 to a training set { ( x ( i ) , y ( i ) ); i = 1 , . . . , m } as usual. Consider the cost function: J ( ) = 1 2 m summationdisplay i =1 ( h ( x ( i ) )- y ( i ) ) 2 + 2 || || 2 2 where || || 2 is defined as || || 2 = radicaltp radicalvertex radicalvertex radicalbt n summationdisplay j =0 2 j Note that this is the same cost function used in class, but with the addition of the regularization term 2 || || 2 2 at the end. This term serves to keep the norm of the parameters small, and minimizing J ( ) now involves making a tradeoff...
View Full Document

This note was uploaded on 12/15/2009 for the course CS 221 taught by Professor Koller,ng during the Fall '09 term at Stanford.

Page1 / 7

cs221-ps2 - CS221 Problem Set #2 1 CS 221 Problem Set #2:...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online