This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Regularized Least Squares 9.520 Class 04, 21 February 2006 Ryan Rifkin Plan • Introduction to Regularized Least Squares • Computation: General RLS • Large Data Sets: Subset of Regressors • Computation: Linear RLS Regression We have a training set S = { ( x 1 , y 1 ) , . . . , ( x ℓ , y ℓ ) } . The y i are realvalued . The goal is to learn a function f to predict the y values associated with new observed x values. Our Friend Tikhonov Regularization We pose our regression task as the Tikhonov minimization problem: ℓ 1 λ f 2 f ∈H 2 2 K f = arg min V ( f ( x i ) , y i ) + i =1 To fully specify the problem, we need to choose a loss function V and a kernel function K . The Square Loss For regression, a natural choice of loss function is the square loss V ( f ( x ) , y ) = ( f ( x ) − y ) 2 . 0 1 2 3 4 5 6 7 8 9 L2 loss −3 −2 −1 1 2 3 y−f(x) Substituting In The Square Loss Using the square loss, our problem becomes ℓ 1 f 2 f = arg min ( f ( x i ) − y i ) 2 + λ K . f ∈H 2 2 i =1 The Return of the Representer Theorem Theorem. The solution to the Tikhonov regularization problem ℓ 1 λ f 2 f ∈H 2 2 K min V ( y i , f ( x i )) + i =1 can be written in the form ℓ f = c i K ( x i , · ) . i =1 This theorem is exceedingly useful — it says that to solve the Tikhonov regularization problem, we need only find the best function of the form f = ℓ i =1 c i K ( x i ). Put differently, all we have to do is find the c i . Applying the Representer Theorem, I NOTATION ALERT!!! We use the symbol K for the kernel function, and boldface K for the ℓby ℓ matrix: K ij ≡ K ( x i , x j ) Using this definition, consider the output of our function ℓ f = c i K ( x i , · ) . i =1 at the training point x j : ℓ f ( x j ) = K ( x i , x j ) c i i =1 = ( Kc ) j Using the Norm of a “Represented” Function A function in the RKHS with a finite representation ℓ f = c i K ( x i , · ) , i =1 satisfies ℓ ℓ f 2 k = c i K ( x i , · ) , c j K ( x j , · ) i =1 j =1 ℓ ℓ = c i c j K ( x i , · ) , K ( x j , · ) i =1 j =1 ℓ ℓ = c i c j K ( x i , x j ) i =1 j =1 = c t Kc . The RLS Problem Substituting, our Tikhonov minimization problem becomes: 1 min c K c . Kc − y 2 λ T 2 + c ∈ R ℓ 2 2 Solving the Least Squares Problem, I We are trying to minimize 1 g ( c ) = c K c . Kc − y 2...
View
Full
Document
This note was uploaded on 11/11/2011 for the course BIO 9.07 taught by Professor Ruthrosenholtz during the Spring '04 term at MIT.
 Spring '04
 RuthRosenholtz

Click to edit the document details