lecture10 - CSE 6740 Lecture 10 How Can I Reduce/Relate the...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CSE 6740 Lecture 10 How Can I Reduce/Relate the Features? (Dimension Reduction) Alexander Gray agray@cc.gatech.edu Georgia Institute of Technology CSE 6740 Lecture 10 p. 1/3 4 Today 1. Dimensionality Reduction: Linear 2. Dimensionality Reduction: Nonlinear CSE 6740 Lecture 10 p. 2/3 4 Dimensionality Reduction: Linear One of the major unsupervised learning tasks. CSE 6740 Lecture 10 p. 3/3 4 Dimensionality Reduction Make the data kind of the same, except with fewer columns. Why? Sometimes: Computational reasons (reduce D ) Statistical reasons ( e.g. remove noise) Mainly: Visualization/understanding reasons (see data in 2-D or 3-D, or identify fundamental underlying variables) CSE 6740 Lecture 10 p. 4/3 4 Principal Components Analysis As usual, lets start with the simplest case, which is linear. We want to map vectors x R D to vectors z R D where D < D . We can always represent x as a linear combination of a set of D orthonormal vectors u d , x = D summationdisplay d =1 z d u d (1) where the vectors u d satisfy the orthonormality relation u T d u e d = d e d (2) where the Kronecker delta d e d = 1 if d = tildewide d and otherwise. CSE 6740 Lecture 10 p. 5/3 4 Principal Components Analysis The coefficients z d have the form z d = u T d x (3) which can be regarded as a simple rotation of the coordinate system from the original x s to a new set of coordinates given by the z s. Now suppose we retain only D < D of the basis vectors u d , so that each vector x is approximated by hatwide x = D summationdisplay d =1 z d u d + D summationdisplay d = D +1 b d u d . (4) CSE 6740 Lecture 10 p. 6/3 4 Principal Components Analysis The error in the approximation is x hatwide x = D summationdisplay d = D +1 ( z d b d ) u d . (5) We can minimize the sum of squared errors over the whole dataset: E D = 1 2 N summationdisplay i =1 || x i hatwide x i || 2 (6) = 1 2 N summationdisplay i =1 D summationdisplay d = D +1 ( z d b d ) 2 . (7) CSE 6740 Lecture 10 p. 7/3 4 Principal Components Analysis Setting the derivative of E D with respect to b d to 0 we get b d = 1 N N summationdisplay i =1 z id = u T d x (8) where x = 1 N N i =1 x i . For optimal coefficients b d then, E D = 1 2 D summationdisplay d = D +1 N summationdisplay i =1 bracketleftBig u T d ( x i x ) bracketrightBig 2 (9) = 1 2 D summationdisplay d = D +1 u T d u d (10) where = N i =1 ( x i x )( x i x ) T is the covariance matrix. CSE 6740 Lecture 10 p. 8/3 4 Principal Components Analysis We now need to minimize E D with respect to the choice of basis vectors u d . Some linear algebra shows that the minimum occurs when the basis vectors satisfy u d = d u d (11) so that they are the eigenvectors of the covariance matrix....
View Full Document

Page1 / 38

lecture10 - CSE 6740 Lecture 10 How Can I Reduce/Relate the...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online