This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: CS 478 Machine Learning: Homework 3 Suggested Solutions 1 SVM inside-out, the Primal (15 points) (a) It is easy to check that for the soft margin SVM, ~w = ~ , ξ = 1 is always a feasible solution. All the constraints are satisfied. (b) Consider the i th example ( ~x i , y i ). If ~w makes a mistake on this example, then y i ( ~w · ~x i ) ≤ 0. Since we know ( ~w, ξ 1 , . . . ,~x n ) is a feasible solution, so we must have y i ( ~w · ~x i ) ≥ 1- ξ i . Therefore ξ i ≥ 1- y i ( ~w · ~x i ) ≥ 1 (since y i ( ~w · ~x i ) ≤ 0). Thus we have proved that if ~w makes a mistake on an example ( ~x i , y i ), the correspodning value of the slack variable ξ i must be greater than 1. Moreover if ~w does not make a mistake on ( ~x i , y i ), the value of the corresponding ξ i is greater than 0 due to the ξ i ≥ 0 constraint in the optimization problem. Therefore we have shown ξ i is greater than the number of mistake made (either 0 or 1) for each example ( ~x i , y i ). Summing up from i = 1 to n and we know that ∑ n i =1 ξ i upper bounds the total number of mistakes on the training set. Comment: Most of you did well in this question, except for the proof in part (b) many forgot to mention that ξ i ≥ 0 for the correctly classified examples. Note that it is also possible that your values of the dual variables α i differ from what we listed here since it is usually not unique. However, whatever values of α i you have they should give rise to the same weight vector ~w and slack variables ξ i . 2 SVM inside-out, the Dual (a) Below is the output file for alpha for C = 0 . 1 0.048258912192304493 0.10000000000000001 0.051704987537661617-0.10000000000000001-0.081529671724244634-0.018434228005720588 1 To compute ~w , we can use the relation that ~w = ∑ n i =1 α i y i ~x i . In this case we have: ~w = 0 . 048258 * [- 3 2] + 0 . 1 * [- 1 1] + 0 . 051704 * [0 2]- . 1 * [- 2 0]- . 081529 * [- 1- 2]- . 018434 * [2- 2] = [- . 0001 0 . 4999] (1) To compute the bias b , we can pick any example ( ~x i , y i ) such that 0 < α i < C . For such examples we have y i ( ~w · ~x i + b ) = 1. therefore b = y i- ~w · ~x i . So in this case we can pick ( ~x 1 , y 1 ), and solving for b we have b =- 5 . 7760 e- 005. Using exactly the same procedures, we can solve for ~w and b for C = 10. In this case we have ~w = [0 . 6660 1 . 3325] and b = 0 . 3330....
View Full Document
This note was uploaded on 10/02/2008 for the course CS 478 taught by Professor Joachims during the Spring '08 term at Cornell.
- Spring '08
- Machine Learning