{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lecture8-annotated

# lecture8-annotated - Machine Learning 10-701/15-781 Fall...

This preview shows pages 1–7. Sign up to view the full content.

1 © Eric Xing @ CMU, 2006-2008 1 Machine Learning Machine Learning 10 10 -701/15 701/15 -781, Fall 2008 781, Fall 2008 Advanced topics in Max Advanced topics in Max -Margin Margin Learning Learning Eric Xing Eric Xing Lecture 8, October 1, 2008 Reading: class handouts © Eric Xing @ CMU, 2006-2008 2 Recap: the SVM problem z We solve the following constrained opt problem: z This is a quadratic programming problem. z A global maximum of α i can always be found. z The solution: z How to predict: = = m i i i i y w 1 x α = = = m j i j T i j i j i m i i y y 1 1 2 1 , ) ( ) ( max x x J . 0 , , 1 , 0 s.t. 1 = = = m i i i i y m i C K

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 © Eric Xing @ CMU, 2006-2008 3 The SMO algorithm z Consider solving the unconstrained opt problem: z We’ve already see three opt algorithms! z Coordinate ascent z Gradient ascent z Newton-Raphson z Coordinate ascend: © Eric Xing @ CMU, 2006-2008 4 Coordinate ascend
3 © Eric Xing @ CMU, 2006-2008 5 Sequential minimal optimization z Constrained optimization: z Question: can we do coordinate along one direction at a time (i.e., hold all α [- i ] fixed, and update i ?) = = = m j i j T i j i j i m i i y y 1 1 2 1 , ) ( ) ( max x x J . 0 , , 1 , 0 s.t. 1 = = = m i i i i y m i C K © Eric Xing @ CMU, 2006-2008 6 The SMO algorithm Repeat till convergence 1. Select some pair i and j to update next (using a heuristic that tries to pick the two that will allow us to make the biggest progress towards the global maximum). 2. Re-optimize J( ) with respect to i and j , while holding all the other k 's ( k i ; j ) fixed. Will this procedure converge?

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
4 © Eric Xing @ CMU, 2006-2008 7 Convergence of SMO z Let’s hold α 3 ,…, m fixed and reopt J w.r.t. 1 and 2 = = = m j i j T i j i j i m i i y y 1 , 1 ) ( 2 1 ) ( max x x J . , , , 0 s.t. = = = m i i i i y k i C 1 0 1 K KKT: © Eric Xing @ CMU, 2006-2008 8 Convergence of SMO z The constraints: z The objective: z Constrained opt:
5 © Eric Xing @ CMU, 2006-2008 9 Cross-validation error of SVM z The leave-one-out cross-validation error does not depend on the dimensionality of the feature space but only on the # of support vectors! examples training of # ctors support ve # error CV out - one - Leave = © Eric Xing @ CMU, 2006-2008 10 Advanced topics in Max-Margin Learning z Kernel z Point rule or average rule z Can we predict vec(y)? = = = m j i j T i j i j i m i i y y 1 1 2 1 , ) ( ) ( max x x α J

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
6 © Eric Xing @ CMU, 2006-2008 11 Outline z The Kernel trick z Maximum entropy discrimination z Structured SVM, aka, Maximum Margin Markov Networks © Eric Xing @ CMU, 2006-2008 12 (1) Non-linear Decision Boundary z So far, we have only considered large-margin classifier with a linear decision boundary z How to generalize it to become nonlinear? z Key idea: transform x i to a higher dimensional space to “make life easier” z Input space: the space the point x i are located z Feature space: the space of φ ( x i ) after transformation z Why transform?
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 23

lecture8-annotated - Machine Learning 10-701/15-781 Fall...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online