This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CSE 6740 Lecture 10 How Can I Reduce/Relate the Features? (Dimension Reduction) Alexander Gray [email protected] Georgia Institute of Technology CSE 6740 Lecture 10 – p. 1/3 4 Today 1. Dimensionality Reduction: Linear 2. Dimensionality Reduction: Nonlinear CSE 6740 Lecture 10 – p. 2/3 4 Dimensionality Reduction: Linear One of the major unsupervised learning tasks. CSE 6740 Lecture 10 – p. 3/3 4 Dimensionality Reduction “Make the data kind of the same, except with fewer columns.” Why? Sometimes: Computational reasons (reduce D ) Statistical reasons ( e.g. remove noise) Mainly: Visualization/understanding reasons (see data in 2D or 3D, or identify fundamental underlying variables) CSE 6740 Lecture 10 – p. 4/3 4 Principal Components Analysis As usual, let’s start with the simplest case, which is linear. We want to map vectors x ∈ R D to vectors z ∈ R D ′ where D ′ < D . We can always represent x as a linear combination of a set of D orthonormal vectors u d , x = D summationdisplay d =1 z d u d (1) where the vectors u d satisfy the orthonormality relation u T d u e d = δ d e d (2) where the Kronecker delta δ d e d = 1 if d = tildewide d and otherwise. CSE 6740 Lecture 10 – p. 5/3 4 Principal Components Analysis The coefficients z d have the form z d = u T d x (3) which can be regarded as a simple rotation of the coordinate system from the original x ’s to a new set of coordinates given by the z ’s. Now suppose we retain only D ′ < D of the basis vectors u d , so that each vector x is approximated by hatwide x = D ′ summationdisplay d =1 z d u d + D summationdisplay d = D ′ +1 b d u d . (4) CSE 6740 Lecture 10 – p. 6/3 4 Principal Components Analysis The error in the approximation is x − hatwide x = D summationdisplay d = D ′ +1 ( z d − b d ) u d . (5) We can minimize the sum of squared errors over the whole dataset: E D ′ = 1 2 N summationdisplay i =1  x i − hatwide x i  2 (6) = 1 2 N summationdisplay i =1 D summationdisplay d = D ′ +1 ( z d − b d ) 2 . (7) CSE 6740 Lecture 10 – p. 7/3 4 Principal Components Analysis Setting the derivative of E D ′ with respect to b d to 0 we get b d = 1 N N summationdisplay i =1 z id = u T d x (8) where x = 1 N ∑ N i =1 x i . For optimal coefficients b d then, E D ′ = 1 2 D summationdisplay d = D ′ +1 N summationdisplay i =1 bracketleftBig u T d ( x i − x ) bracketrightBig 2 (9) = 1 2 D summationdisplay d = D ′ +1 u T d Σ u d (10) where Σ = ∑ N i =1 ( x i − x )( x i − x ) T is the covariance matrix. CSE 6740 Lecture 10 – p. 8/3 4 Principal Components Analysis We now need to minimize E D ′ with respect to the choice of basis vectors u d . Some linear algebra shows that the minimum occurs when the basis vectors satisfy Σ u d = λ d u d (11) so that they are the eigenvectors of the covariance matrix....
View
Full Document
 Fall '08
 Staff
 Singular value decomposition, Principal components analysis

Click to edit the document details