problemset2

# problemset2 - CS229 Problem Set#2 1 CS 229 Public Course...

This preview shows pages 1–2. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS229 Problem Set #2 1 CS 229, Public Course Problem Set #2: Kernels, SVMs, and Theory 1. Kernel ridge regression In contrast to ordinary least squares which has a cost function J ( θ ) = 1 2 m summationdisplay i =1 ( θ T x ( i ) − y ( i ) ) 2 , we can also add a term that penalizes large weights in θ . In ridge regression , our least squares cost is regularized by adding a term λ bardbl θ bardbl 2 , where λ > 0 is a fixed (known) constant (regularization will be discussed at greater length in an upcoming course lecutre). The ridge regression cost function is then J ( θ ) = 1 2 m summationdisplay i =1 ( θ T x ( i ) − y ( i ) ) 2 + λ 2 bardbl θ bardbl 2 . (a) Use the vector notation described in class to find a closed-form expreesion for the value of θ which minimizes the ridge regression cost function. (b) Suppose that we want to use kernels to implicitly represent our feature vectors in a high-dimensional (possibly infinite dimensional) space. Using a feature mapping φ , the ridge regression cost function becomes J ( θ ) = 1 2 m summationdisplay i =1 ( θ T φ ( x ( i ) ) − y ( i ) ) 2 + λ 2 bardbl θ bardbl 2 . Making a prediction on a new input x new would now be done by computing θ T φ ( x new ). Show how we can use the “kernel trick” to obtain a closed form for the prediction on the new input without ever explicitly computing φ ( x new ). You may assume that the parameter vector θ can be expressed as a linear combination of the input feature vectors; i.e., θ = ∑ m i =1 α i φ ( x ( i ) ) for some set of parameters α i . [Hint: You may find the following identity useful: ( λI + BA )- 1 B = B ( λI + AB )- 1 . If you want, you can try to prove this as well, though this is not required for the problem.] 2. ℓ 2 norm soft margin SVMs In class, we saw that if our data is not linearly separable, then we need to modify our support vector machine algorithm by introducing an error margin that must be minimized. Specifically, the formulation we have looked at is known as the ℓ 1 norm soft margin SVM....
View Full Document

## This note was uploaded on 01/24/2010 for the course CS 229 at Stanford.

### Page1 / 4

problemset2 - CS229 Problem Set#2 1 CS 229 Public Course...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online