Unformatted text preview: Pattern Recognition
ECE8443 Chapter 3, Part 4
Principal Component Analysis
Electrical and Computer Engineering Department,
Mississippi State University. 1 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Outline of the next couple of
lectures
• Transform based feature reduction methods
– Find a mapping of the feature space onto a lower dimensional subspace • Feature selection methods
– Select a subset of features that you think are “useful” • Combination of feature selection and mapping
– First create a subset of features, then do the mapping. 2 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Transform based dimensionality
reduction
• Broad task: Find a mapping W:R N → R M , M<N, such that
either
(1) the MSE in representing the data in the M dimensional
subspace is minimized (e.g., PCA), or,
(2) Some alternate metric, that quantifies class separation is
maximized (e.g., Fisher’s ratio – LDA) 3 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Transform based dimensionality
reduction
• Principal Components Analysis (PCA): projection that best
represents the data in a leastsquare sense
• Fisher’s Discriminant Analysis (FLDA, or, LDA): projection
that best separates the data in a leastsquares sense
• Independent Component Analysis (ICA): projection that
minimizes the mutual information of the components
• We will not cover ICA in this course, but you are encouraged to
review the concept from online tutorials and resources (See course
website). 4 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Principal Component Analysis
• Consider representing a set of n ddimensional samples x1,…,xn
by a single vector, x0.
n • Define a squarederror criterion: J 0 (x 0 ) = ∑ x 0 − x k 2 k =1 • It is easy to show (page 115 of text) that the solution to this
problem is given by:
1n
x0 = m = ∑ xk
n k −1
• The sample mean is a zerodimensional representation of the
data set. 5 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Principal Component Analysis
• Consider a onedimensional solution in which we project the data into a line
running through the sample mean:
x =m+ae
where e is a unit vector in the direction of this line, and a is a scalar representing
the distance of any point from the mean.
• We can write the squarederror criterion as:
n J1(a1, a2 ,..., an , e ) = ∑ (m + ak e) − x k 2 k =1 6 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Minimizing Squared Error
n J1(a1, a2 ,..., an , e ) = ∑ (m + ak e) − x k 2 k =1
n = ∑ ak e − (x k − m) 2 k =1
n =∑ k =1 • Note that: 2 2
ak n n e − 2 ∑ ak e (x k − m) + ∑ x k − m
t k =1 2 k =1 e = 1 (the norm of the unit vector is 1) t
• Differentiate with respect to ak and obtain: ak = e (x k − m) • The geometric interpretation is the we obtain a leastsquares
solution by projecting the vector, x, onto a line in the direction of
e that passes through the sample mean.
• But what is the best direction for e?
7 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Finding the best direction e from
the covariance structure
• Define a scatter matrix, S:
n S = ∑ (x k − m)(x k − m)t
k =1 This should look familiar, it is (n1) times the sample covariance
matrix.
• If we substitute our solution for ak into our expression for the
squared error (next slide) 8 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Finding the best direction e from
the covariance structure
2 n J1(e ) = ∑ k =1 2
ak n n e − 2 ∑ ak e (x k − m) + ∑ x k − m
t k =1 k =1 n n n k =1 k =1 k =1
2 2
= ∑ ak − 2 ∑ ak ak + ∑ x k − m = n 2
− ∑ ak
k =1 n 2 n ( 2 ) 2 n + ∑ x k − m = − ∑ e (x k − m) + ∑ x k − m
k =1 t k =1 n n k =1 2 k =1 2 k =1 = − ∑ et (x k − m)(x k − m)t e + ∑ x k − m
t t n n 2 n = −e ∑ (x k − m)(x k − m) e + ∑ x k − m = −e Se + ∑ x k − m k =1 k =1
k =1 9 Chapter 3 Saurabh Prasad Pattern Recognition t 2 Electrical and Computer Engineering Department Finding the best direction e from
the covariance structure
• The vector, e, that minimizes J1 also maximizes et Se . • Use Lagrange multipliers to maximize et Se subject to the
constraint e = 1 . (See appendix A.3 of the text if you need to
review the concept of Lagrange multipliers)
• Let λ be the undetermined multiplier, and differentiate: ( ) u = et Se − λ et e − 1 ∂u
= 2Se − 2λe
∂e
• Set to zero and solve: Se = λe with respect to e, to obtain: t
• It follows to maximize e Se we want to select an eigenvector
corresponding to the largest eigenvalue of the scatter matrix. 10 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Finding the best direction e from
the covariance structure
• In other words, the best onedimensional projection
of the data (in the least meansquared error sense) is
the projection of the data onto a line through the
sample mean in the direction of the eigenvector of the
scatter matrix having the largest eigenvalue (hence
the name Principal Component).
• For the Gaussian case, the eigenvectors are the
principal axes of the hyperellipsoidally shaped support
region! 11 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Finding the best direction e from
the covariance structure
• The previous result can be easily extended from a onedimensional projection to a d’dimensional projection, where
d’<d
d' x = m + ∑ ai ei
i =1 • The criterion function
n d' k =1 2 i =1 J d ' (e) = ∑ (m + ∑ a ki ei ) − x k is maximized when the vectors e1, e2, e3,…, ed’ are the d’
eigenvectors of the scatter matrix having the largest
eigenvalues. 12 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department ...
View
Full Document
 Spring '10
 Staff
 Singular value decomposition, Independent Component Analysis, Saurabh Prasad

Click to edit the document details