3. Linear subspace methods
The goal of subspace learning (or dimensionality reduction) is to map the data set in the
high dimensional space to the lower dimensional space such that certain properties are
preserved. Examples of properties to be preserved include the global geometry and
neighborhood information. Usually the property preserved is quantified by an objective
function and the dimensionality reduction problem is formulated as an optimization
problem. The generic problem of linear dimensionality reduction is the following. Given a
multi-dimensional data set
x
1
,
x
2
, ... ,
x
m
in
R
n
, find a transformation matrix
W
that maps these
m
points to
y
1
,
y
2
, ... ,
y
m
in
R
l
(
l
n
), such that
y
i
represent
x
i
, where
y
i
=
W
T
x
i
. In this section, we
briefly review the existing linear subspace methods PCA, LDA, LPP, ONPP, LSDA, and
their variants.
3.1 Principle Component Analysis (PCA)
Two of the most popular techniques for linear subspace learning are PCA and LDA. PCA
(Turk & Pentland, 1991) is an eigenvector method designed to model linear variation in
high-dimensional data. PCA aims at preserving the global variance by finding a set of
mutual orthogonal basis functions that capture the directions of maximum variance in the
data.
Let
w
denote a transformation vector, the objective function is as follows:
(1)
The solution
w
0
, ... ,
w
l
-1
is an orthonormal set of vectors representing the eigenvector of the
data’s covariance matrix associated with the
l
largest eigenvalues.
3.2 Linear Discriminant Analysis (LDA)
While PCA is an unsupervised method and seeks directions that are efficient for
representation, LDA (Belhumeur et al, 1997) is a supervised approach and seeks directions

Linear Subspace Learning for Facial Expression Analysis
265
that are efficient for discrimination. LDA searches for the projection axes on which the data
points of different classes are far from each other while requiring data points of the same
class to be close to each other.
Suppose the data samples belong to
c
classes, The objective function is as follows:
(2)
(3)
(4)
where
m
is the mean of all the samples,
n
i
is the number of samples in the
i
th class,
m
(
i
)
is the
average vector of the
i
th class, and
is the
j
th sample in the
i
th class.
3.3 Locality Preserving Projections (LPP)
LPP (He & Niyogi, 2003) seeks to preserve the intrinsic geometry of the data by preserving
locality. To derive the optimal projections preserving locality, LPP employs the same
objective function with Laplacian Eigenmaps:
(5)
where
S
i j
evaluates a local structure of the data space, and is defined as:
(6)
or in a simpler form as
(7)
where “close” can be defined as
║
x
i
−
x
j
║
2
<
ε
, where
ε
is a small constant, or
x
i
is among
k
nearest neighbors of
x
j
or
x
j
is among
k
nearest neighbors of
x
i
. The
objective function with
symmetric weights
S
i j
(
S
i j
=
S
ji
) incurs a heavy penalty
if neighboring points
x
i
and
x
j
are
mapped far apart. Minimizing their distance is
therefore an attempt to ensure that if
x
i
and
x
j
are “close”,
y
i
(=
w
T
x
i
) and
y
j
(=
w
T
x
j
) are also “close”. The objective function of Eqn. (5) can
be reduced to:
(8)