This preview shows pages 1–10. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CSE 6740 Lecture 14 How Can I Learn Fancier Models? II (Kernelization) Alexander Gray agray@cc.gatech.edu Georgia Institute of Technology CSE 6740 Lecture 14 p. 1/2 6 Today 1. How to make more complex models using kernels 2. Theory motivating kernels CSE 6740 Lecture 14 p. 2/2 6 More Complex Models, Using Kernels Why kernels, part I. Because we can get richer models, yet leave the methods the same. CSE 6740 Lecture 14 p. 3/2 6 Generalized Linear Models Suppose we have data { ( x,y ) } i where each x X = R 2 is a vector ( x 1 ,x 2 ) like the following. Then the classes cannot be separated by a linear decision boundary. CSE 6740 Lecture 14 p. 4/2 6 Generalized Linear Models Now lets make a transformed dataset { ( z,y ) } i where z = ( x ) = ( x 2 1 , 2 x 1 x 2 ,x 2 2 ) . (1) CSE 6740 Lecture 14 p. 5/2 6 Generalized Linear Models Thus is a map from X = R 2 to Z = R 3 . In the new space, the data are linearly separable. So here a linear classifier in a higherdimensional space corresponds to a nonlinear classifier in the original space. Thus we get to leave our learning method exactly as it was. CSE 6740 Lecture 14 p. 6/2 6 Hit Video of the Year... By the band Hilbert and the Kernelizers : http://www.youtube.com/watch?v=3liCbRZPrZA CSE 6740 Lecture 14 p. 7/2 6 Generalized Linear Models We can do this with any linear model, including for example linear regression, where this is called generalized linear regression , to effectively get a more powerful model class. However, there are drawbacks to this. We dont know in advance which features need to be constructed. Thus we might want to consider all possible products of the features, for example. But even considering all possible quadruplets of features, if D =256, yields 183,181,376 features in the transformed space. CSE 6740 Lecture 14 p. 8/2 6 Kernel Trick Now suppose we have a model that can be represented in terms of only dot products between points, ( x, tildewide x ) . Now notice that the inner product in Z can be written ( z, tildewide z ) = ( ( x ) , ( tildewide x ) ) (2) = x 2 1 tildewide x 2 1 + 2 x 1 tildewide x 1 x 2 tildewide x 2 + x 2 2 tildewide x 2 2 (3)...
View Full
Document
 Fall '08
 Staff

Click to edit the document details