SPR_LectureHandouts_Chapter_03_Part4_PCA

SPR_LectureHandouts_Chapter_03_Part4_PCA - Pattern...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Pattern Recognition ECE-8443 Chapter 3, Part 4 Principal Component Analysis Electrical and Computer Engineering Department, Mississippi State University. 1 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Outline of the next couple of lectures • Transform based feature reduction methods – Find a mapping of the feature space onto a lower dimensional subspace • Feature selection methods – Select a subset of features that you think are “useful” • Combination of feature selection and mapping – First create a subset of features, then do the mapping. 2 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Transform based dimensionality reduction • Broad task: Find a mapping W:R N → R M , M<N, such that either (1) the MSE in representing the data in the M dimensional subspace is minimized (e.g., PCA), or, (2) Some alternate metric, that quantifies class separation is maximized (e.g., Fisher’s ratio – LDA) 3 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Transform based dimensionality reduction • Principal Components Analysis (PCA): projection that best represents the data in a least-square sense • Fisher’s Discriminant Analysis (F-LDA, or, LDA): projection that best separates the data in a least-squares sense • Independent Component Analysis (ICA): projection that minimizes the mutual information of the components • We will not cover ICA in this course, but you are encouraged to review the concept from online tutorials and resources (See course website). 4 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Principal Component Analysis • Consider representing a set of n d-dimensional samples x1,…,xn by a single vector, x0. n • Define a squared-error criterion: J 0 (x 0 ) = ∑ x 0 − x k 2 k =1 • It is easy to show (page 115 of text) that the solution to this problem is given by: 1n x0 = m = ∑ xk n k −1 • The sample mean is a zero-dimensional representation of the data set. 5 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Principal Component Analysis • Consider a one-dimensional solution in which we project the data into a line running through the sample mean: x =m+ae where e is a unit vector in the direction of this line, and a is a scalar representing the distance of any point from the mean. • We can write the squared-error criterion as: n J1(a1, a2 ,..., an , e ) = ∑ (m + ak e) − x k 2 k =1 6 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Minimizing Squared Error n J1(a1, a2 ,..., an , e ) = ∑ (m + ak e) − x k 2 k =1 n = ∑ ak e − (x k − m) 2 k =1 n =∑ k =1 • Note that: 2 2 ak n n e − 2 ∑ ak e (x k − m) + ∑ x k − m t k =1 2 k =1 e = 1 (the norm of the unit vector is 1) t • Differentiate with respect to ak and obtain: ak = e (x k − m) • The geometric interpretation is the we obtain a least-squares solution by projecting the vector, x, onto a line in the direction of e that passes through the sample mean. • But what is the best direction for e? 7 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Finding the best direction e from the covariance structure • Define a scatter matrix, S: n S = ∑ (x k − m)(x k − m)t k =1 This should look familiar, it is (n-1) times the sample covariance matrix. • If we substitute our solution for ak into our expression for the squared error (next slide) 8 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Finding the best direction e from the covariance structure 2 n J1(e ) = ∑ k =1 2 ak n n e − 2 ∑ ak e (x k − m) + ∑ x k − m t k =1 k =1 n n n k =1 k =1 k =1 2 2 = ∑ ak − 2 ∑ ak ak + ∑ x k − m = n 2 − ∑ ak k =1 n 2 n ( 2 ) 2 n + ∑ x k − m = − ∑ e (x k − m) + ∑ x k − m k =1 t k =1 n n k =1 2 k =1 2 k =1 = − ∑ et (x k − m)(x k − m)t e + ∑ x k − m t t n n 2 n = −e ∑ (x k − m)(x k − m) e + ∑ x k − m = −e Se + ∑ x k − m k =1 k =1 k =1 9 Chapter 3 Saurabh Prasad Pattern Recognition t 2 Electrical and Computer Engineering Department Finding the best direction e from the covariance structure • The vector, e, that minimizes J1 also maximizes et Se . • Use Lagrange multipliers to maximize et Se subject to the constraint e = 1 . (See appendix A.3 of the text if you need to review the concept of Lagrange multipliers) • Let λ be the undetermined multiplier, and differentiate: ( ) u = et Se − λ et e − 1 ∂u = 2Se − 2λe ∂e • Set to zero and solve: Se = λe with respect to e, to obtain: t • It follows to maximize e Se we want to select an eigenvector corresponding to the largest eigenvalue of the scatter matrix. 10 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Finding the best direction e from the covariance structure • In other words, the best one-dimensional projection of the data (in the least mean-squared error sense) is the projection of the data onto a line through the sample mean in the direction of the eigenvector of the scatter matrix having the largest eigenvalue (hence the name Principal Component). • For the Gaussian case, the eigenvectors are the principal axes of the hyperellipsoidally shaped support region! 11 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Finding the best direction e from the covariance structure • The previous result can be easily extended from a onedimensional projection to a d’-dimensional projection, where d’<d d' x = m + ∑ ai ei i =1 • The criterion function n d' k =1 2 i =1 J d ' (e) = ∑ (m + ∑ a ki ei ) − x k is maximized when the vectors e1, e2, e3,…, ed’ are the d’ eigenvectors of the scatter matrix having the largest eigenvalues. 12 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department ...
View Full Document

This note was uploaded on 02/20/2012 for the course ECE 8443 taught by Professor Staff during the Spring '10 term at University of Houston.

Ask a homework question - tutors are online