This preview shows pages 1–11. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Kernel Methods Kernel Methods 2 Simple Idea of Data Fitting x Given ( x i ,y i ) b i=1,,n b x i is of dimension d x Find the best linear function w (hyperplane) that fits the data x Two scenarios b y: real, regression b y: {1,1}, classification x Two cases b n>d, regression, least square b n<d, ridge regression x New sample: x , < x,w> : best fit (regression), best decision (classification) 3 Primary and Dual x There are two ways to formulate the problem: b Primary b Dual x Both provide deep insight into the problem x Primary is more traditional x Dual leads to newer techniques in SVM and kernel methods 4 Regression > =< = = = = =  = y X X X x y X X X w y X Xw X Xw y X w Xw y Xw y Xw y Xw y w w W W T T T T T T T T T i j j ij o i y d d w x w y 1 1 2 ) ( , ) ( ) ( ) ( ) ( ) ( ) ( min arg ) ( min arg ) d T n T T T n T d T d o y y y x x w w w = = = = x x x X y x w M L L L 2 1 2 1 1 1 ] , , , [ , ] , , , 1 [ , ] , , , [ 5 x X is a n (sample size) by d (dimension of data) matrix x w combines the columns of X to best approximate y b Combine features (FICA, income, etc.) to decisions (loan) x H projects y onto the space spanned by columns of X b Simplify the decisions to fit the features y X X X X y X X X X Hy Xw y T T T T 1 1 ) ( ) ( = = = = ) Graphical Interpretation n d X= FICA Income 6 Problem #1 x n=d, exact solution x n>d, least square, (most likely scenarios) x When n < d, there are not enough constraints to determine coefficients w uniquely n d X= W 7 Problem #2 x If different attributes are highly correlated (income and FICA) x The columns become dependent x Coefficients are then poorly determined with high variance b E.g., large positive coefficient on one can be canceled by a similarly large negative coefficient on its correlated cousin b Size constraint is helpful b Caveat: constraint is problem dependent 8 Ridge Regression x Similar to regularization > + =< + = + = + = = + = + + = + = y X I X X x y X I X X w w I X X y X w Xw X y X w Xw y X w w w Xw y Xw y w w Xw y Xw y w w W W T T T T ridge T T T T T T T T T ridge i j j j j ij o i ridge y d d w w x w y 1 1 2 2 ) ( , ) ( ) ( ) ( ) ( ) ( ) ( ) ( min arg ) ( min arg 9 Ugly Math y u u y U I U y U I V V V U y U V I V U y U V I U U V U y X I X X X Xw y y X I X X w T i i i i i T T T T T T T T T T T T T T T T T T T T T T ridge T T ridge V V V V V V V + = + = + = + = + = + = = + =  2 2 1 1 1 1 1 1 1 1 1 1 1 ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) T V U X = 10 How to Decipher This x Red: best estimate (y hat) is composed of columns of U (basis features, recall U and X have the same column space) x Green: how these basis columns are weighed x Blue: projection of target (y) onto these columns x Together: representing y in a bodyfitted coordinate system ( u i ) y u u y T i i i i i + = 2 2 ) 11...
View
Full
Document
This note was uploaded on 08/06/2008 for the course CS 290I taught by Professor Wang during the Spring '07 term at UCSB.
 Spring '07
 WANG
 Machine Learning

Click to edit the document details