3 Linear subspace methods The goal of subspace learning or dimensionality

# 3 linear subspace methods the goal of subspace

• 31

This preview shows page 6 - 8 out of 31 pages.

3. Linear subspace methods The goal of subspace learning (or dimensionality reduction) is to map the data set in the high dimensional space to the lower dimensional space such that certain properties are preserved. Examples of properties to be preserved include the global geometry and neighborhood information. Usually the property preserved is quantified by an objective function and the dimensionality reduction problem is formulated as an optimization problem. The generic problem of linear dimensionality reduction is the following. Given a multi-dimensional data set x 1 , x 2 , ... , x m in R n , find a transformation matrix W that maps these m points to y 1 , y 2 , ... , y m in R l ( l n ), such that y i represent x i , where y i = W T x i . In this section, we briefly review the existing linear subspace methods PCA, LDA, LPP, ONPP, LSDA, and their variants. 3.1 Principle Component Analysis (PCA) Two of the most popular techniques for linear subspace learning are PCA and LDA. PCA (Turk & Pentland, 1991) is an eigenvector method designed to model linear variation in high-dimensional data. PCA aims at preserving the global variance by finding a set of mutual orthogonal basis functions that capture the directions of maximum variance in the data. Let w denote a transformation vector, the objective function is as follows: (1) The solution w 0 , ... , w l -1 is an orthonormal set of vectors representing the eigenvector of the data’s covariance matrix associated with the l largest eigenvalues. 3.2 Linear Discriminant Analysis (LDA) While PCA is an unsupervised method and seeks directions that are efficient for representation, LDA (Belhumeur et al, 1997) is a supervised approach and seeks directions Linear Subspace Learning for Facial Expression Analysis 265 that are efficient for discrimination. LDA searches for the projection axes on which the data points of different classes are far from each other while requiring data points of the same class to be close to each other. Suppose the data samples belong to c classes, The objective function is as follows: (2) (3) (4) where m is the mean of all the samples, n i is the number of samples in the i th class, m ( i ) is the average vector of the i th class, and is the j th sample in the i th class. 3.3 Locality Preserving Projections (LPP) LPP (He & Niyogi, 2003) seeks to preserve the intrinsic geometry of the data by preserving locality. To derive the optimal projections preserving locality, LPP employs the same objective function with Laplacian Eigenmaps: (5) where S i j evaluates a local structure of the data space, and is defined as: (6) or in a simpler form as (7) where “close” can be defined as x i x j 2 < ε , where ε is a small constant, or x i is among k nearest neighbors of x j or x j is among k nearest neighbors of x i . The objective function with symmetric weights S i j ( S i j = S ji ) incurs a heavy penalty if neighboring points x i and x j are mapped far apart. Minimizing their distance is therefore an attempt to ensure that if x i and x j are “close”, y i (= w T x i ) and y j (= w T x j ) are also “close”. The objective function of Eqn. (5) can be reduced to: (8)  • • • 