This preview shows pages 1–12. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Regularized Least Squares 9.520 Class 04, 21 February 2006 Ryan Rifkin Plan Introduction to Regularized Least Squares Computation: General RLS Large Data Sets: Subset of Regressors Computation: Linear RLS Regression We have a training set S = { ( x 1 , y 1 ) , . . . , ( x , y ) } . The y i are realvalued . The goal is to learn a function f to predict the y values associated with new observed x values. Our Friend Tikhonov Regularization We pose our regression task as the Tikhonov minimization problem: 1 f 2 f H 2 2 K f = arg min V ( f ( x i ) , y i ) + i =1 To fully specify the problem, we need to choose a loss function V and a kernel function K . The Square Loss For regression, a natural choice of loss function is the square loss V ( f ( x ) , y ) = ( f ( x ) y ) 2 . 0 1 2 3 4 5 6 7 8 9 L2 loss 3 2 1 1 2 3 yf(x) Substituting In The Square Loss Using the square loss, our problem becomes 1 f 2 f = arg min ( f ( x i ) y i ) 2 + K . f H 2 2 i =1 The Return of the Representer Theorem Theorem. The solution to the Tikhonov regularization problem 1 f 2 f H 2 2 K min V ( y i , f ( x i )) + i =1 can be written in the form f = c i K ( x i , ) . i =1 This theorem is exceedingly useful it says that to solve the Tikhonov regularization problem, we need only find the best function of the form f = i =1 c i K ( x i ). Put differently, all we have to do is find the c i . Applying the Representer Theorem, I NOTATION ALERT!!! We use the symbol K for the kernel function, and boldface K for the by matrix: K ij K ( x i , x j ) Using this definition, consider the output of our function f = c i K ( x i , ) . i =1 at the training point x j : f ( x j ) = K ( x i , x j ) c i i =1 = ( Kc ) j Using the Norm of a Represented Function A function in the RKHS with a finite representation f = c i K ( x i , ) , i =1 satisfies f 2 k = c i K ( x i , ) , c j K ( x j , ) i =1 j =1 = c i c j K ( x i , ) , K ( x j , ) i =1 j =1 = c i c j K ( x i , x j ) i =1 j =1 = c t Kc . The RLS Problem Substituting, our Tikhonov minimization problem becomes: 1 min c K c . Kc y 2 T 2 + c R 2 2 Solving the Least Squares Problem, I We are trying to minimize 1 g ( c ) = c K c . Kc y 2...
View Full
Document
 Spring '04
 RuthRosenholtz

Click to edit the document details