problemset2

problemset2 - CS229 Problem Set #2 1 CS 229, Public Course...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS229 Problem Set #2 1 CS 229, Public Course Problem Set #2: Kernels, SVMs, and Theory 1. Kernel ridge regression In contrast to ordinary least squares which has a cost function J ( ) = 1 2 m summationdisplay i =1 ( T x ( i ) y ( i ) ) 2 , we can also add a term that penalizes large weights in . In ridge regression , our least squares cost is regularized by adding a term bardbl bardbl 2 , where > 0 is a fixed (known) constant (regularization will be discussed at greater length in an upcoming course lecutre). The ridge regression cost function is then J ( ) = 1 2 m summationdisplay i =1 ( T x ( i ) y ( i ) ) 2 + 2 bardbl bardbl 2 . (a) Use the vector notation described in class to find a closed-form expreesion for the value of which minimizes the ridge regression cost function. (b) Suppose that we want to use kernels to implicitly represent our feature vectors in a high-dimensional (possibly infinite dimensional) space. Using a feature mapping , the ridge regression cost function becomes J ( ) = 1 2 m summationdisplay i =1 ( T ( x ( i ) ) y ( i ) ) 2 + 2 bardbl bardbl 2 . Making a prediction on a new input x new would now be done by computing T ( x new ). Show how we can use the kernel trick to obtain a closed form for the prediction on the new input without ever explicitly computing ( x new ). You may assume that the parameter vector can be expressed as a linear combination of the input feature vectors; i.e., = m i =1 i ( x ( i ) ) for some set of parameters i . [Hint: You may find the following identity useful: ( I + BA )- 1 B = B ( I + AB )- 1 . If you want, you can try to prove this as well, though this is not required for the problem.] 2. 2 norm soft margin SVMs In class, we saw that if our data is not linearly separable, then we need to modify our support vector machine algorithm by introducing an error margin that must be minimized. Specifically, the formulation we have looked at is known as the 1 norm soft margin SVM....
View Full Document

Page1 / 4

problemset2 - CS229 Problem Set #2 1 CS 229, Public Course...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online