hw3_soln

hw3_soln - CS 478 Machine Learning Homework 3 Suggested...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS 478 Machine Learning: Homework 3 Suggested Solutions 1 SVM inside-out, the Primal (15 points) (a) It is easy to check that for the soft margin SVM, ~w = ~ , ξ = 1 is always a feasible solution. All the constraints are satisfied. (b) Consider the i th example ( ~x i , y i ). If ~w makes a mistake on this example, then y i ( ~w · ~x i ) ≤ 0. Since we know ( ~w, ξ 1 , . . . ,~x n ) is a feasible solution, so we must have y i ( ~w · ~x i ) ≥ 1- ξ i . Therefore ξ i ≥ 1- y i ( ~w · ~x i ) ≥ 1 (since y i ( ~w · ~x i ) ≤ 0). Thus we have proved that if ~w makes a mistake on an example ( ~x i , y i ), the correspodning value of the slack variable ξ i must be greater than 1. Moreover if ~w does not make a mistake on ( ~x i , y i ), the value of the corresponding ξ i is greater than 0 due to the ξ i ≥ 0 constraint in the optimization problem. Therefore we have shown ξ i is greater than the number of mistake made (either 0 or 1) for each example ( ~x i , y i ). Summing up from i = 1 to n and we know that ∑ n i =1 ξ i upper bounds the total number of mistakes on the training set. Comment: Most of you did well in this question, except for the proof in part (b) many forgot to mention that ξ i ≥ 0 for the correctly classified examples. Note that it is also possible that your values of the dual variables α i differ from what we listed here since it is usually not unique. However, whatever values of α i you have they should give rise to the same weight vector ~w and slack variables ξ i . 2 SVM inside-out, the Dual (a) Below is the output file for alpha for C = 0 . 1 0.048258912192304493 0.10000000000000001 0.051704987537661617-0.10000000000000001-0.081529671724244634-0.018434228005720588 1 To compute ~w , we can use the relation that ~w = ∑ n i =1 α i y i ~x i . In this case we have: ~w = 0 . 048258 * [- 3 2] + 0 . 1 * [- 1 1] + 0 . 051704 * [0 2]- . 1 * [- 2 0]- . 081529 * [- 1- 2]- . 018434 * [2- 2] = [- . 0001 0 . 4999] (1) To compute the bias b , we can pick any example ( ~x i , y i ) such that 0 < α i < C . For such examples we have y i ( ~w · ~x i + b ) = 1. therefore b = y i- ~w · ~x i . So in this case we can pick ( ~x 1 , y 1 ), and solving for b we have b =- 5 . 7760 e- 005. Using exactly the same procedures, we can solve for ~w and b for C = 10. In this case we have ~w = [0 . 6660 1 . 3325] and b = 0 . 3330....
View Full Document

This note was uploaded on 10/02/2008 for the course CS 478 taught by Professor Joachims during the Spring '08 term at Cornell.

Page1 / 8

hw3_soln - CS 478 Machine Learning Homework 3 Suggested...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online