This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ties of the conditional
density Pr(Y X ). Unsupervised Learning We have a set of N observations (x1 , ..., xN ) of a random pvector
X having joint density Pr(X ).
The goal is to directly infer the properties of this probability density
without the help of a supervisor or teacher providing correct answers
or degreeoferror for each observation.
High dimension, complicated properties of interest. Curse of
Dimensionality.
Must settle for estimating rather crude global models, such as
Gaussian mixtures or various simple descriptive statistics that
characterize Pr(X ).
PCA attempts to identify lowdimensional linear subspace within the
X space that represent high data density.
FA aims to ﬁnd the hidden common structure on the variation of X . 4 Principal Component Direction Largest Principal
Component
o
o 2 0 2 ooo
oo
o
o
o oo
o
o
oooo o
oo o o
o o o o oo
o
o
oo o o o oo o o o
o
oo
oo
oo
oo o o
o
oo ooo o oo o o
oo
o
o
oo o o o o o
oo
o oo
o
o
o oo o o o
o
o
oo o o
oo
o oo o
o o o o ooo o o o
o
o
oooo
o
oo
ooo o o o
oo
o
o
o ooo oooo oo oo o
o
oo
o o oo o o
o
o
o oo
o o o o oo o o
oo
o
o
o
o oo
o
o
o
o
o
o
o
o
oo o o
oo
Smallest Principal
o
o
Component
o o 4 X2 o 4 2 0 X1 2 4 1.0 Best Linear Approximation •
•
• • • 0.5
0.0 • xi
• −0.5 • •• •
• ••
•• • −1.0 • ui1 d1 • Second principal component v1 • FIGURE 14.21. The best ran
tion to the halfsphere data. T Principal Component Analysis Find a direction along which the data has the largest variation.
max {sample variance of X v }.
v =1 Find the best low dimensional linear approximations to the data.
Consider the rankq linear model for representing the pdimensional
data x1 , . . . , xN .
f (η ) = µ + V q η,
where µ ∈ Rp is a location vector, V q is a p × q orthogonal matrix,
and η ∈ Rq is a vector of parameters. Fitting such a model to the
data by least squares amounts to minimizing the reconstruction error
N xi − µ − V q η i 2 . min µ,{ηi },V q i=1 PCA: Preprocessing
1 1
Compute x = N
¯
each observation N
i=1 xi , and then subtract the sample mean from
xi := xi − x.
¯ 2 Optional. Preferred when features are on diﬀerent scales. Normalize
each feature.
N x2
ij sj =
ˆ then xij := i=1 xij
sj for all 1 ≤ i ≤ N, 1 ≤ j ≤ p. From now on we always assume Step 1 has been done. Problems become
Direction of maximum sample variance
max v T X T Xv 1 .
1 v 1 =1 Best linear approximation
N xi − v 1 ηi 2 . min {ηi },v 1 i=1 Singular Value Decomposition
Orthogonal Matrix
Let A be a n × m matrix with n ≥ m. Denote its m columns by
a1 , . . . , am . We say A is orthogonal if its columns are orthonormal, i.e.
aT aj =
i 1
0 if i = j ;
if i = j. The singular value decomposition (SVD) of the N × p (assume N ≥ p)
matrix X has the form
X = U DV T .
U and V are N × p and p × p orthogonal matrices.
The columns of V , denoted by v 1 , . . . , v p , span the row space of X .
The columns of U , u1 , . . . , up , span the column space of X .
D is a p × p diagonal matrix, with diagonal entries
d1 ≥ d2 ≥ · · · ≥ dp ≥ 0, which are called singular values of X .
v 1 , . . . , v p are eigenvectors of the matrix X T X , corresponding to
the eigenvalues d2 ≥ d2 ≥ · · · ≥ d2 .
p
1
2
u1 , . . . , up are eigenvectors of the matrix XX T , corresponding to
the eigenvalues d2 ≥ d2 ≥ · · · ≥ d2 . The rest eigenvalues of XX T
p
1
2
are all zero. PCA: Solution
Fix V , we must have
η i = V T xi ,
ˆ
q
and the problem is reduced to
N xi − V q V T xi 2 .
q min
Vq i=1 Let X be the N × p matrix whose rows are xT , . . . , xT . Compute
1
N
the singular value decomposition (SVD) of X
X = U DV T .
ˆ
For each 1 ≤ q ≤ p, the solution V q consists of the ﬁrst q columns
of V .
vm
z m = X vm = dm um
vm mth principal direction
mth principal component
loadings of the mth principal component Applications Visualization.
Compression.
Computation.
Clearer patterns in lower dimension.
Anomaly detection.
Remove redundancy.
Face recognition and matching.
Microarray analysis.
Web link analysis. 3
0 1 2 Variances 200
100
0 Variances 300 400 4 Asset Excess Returns Comp.1 Comp.3 Comp.5 Comp.7 Comp.9 Comp.1 Comp.3 Comp.5 Comp.7 Comp.9...
View
Full
Document
This note was uploaded on 10/01/2013 for the course FSRM 588 taught by Professor Xiao during the Fall '13 term at Rutgers.
 Fall '13
 Xiao

Click to edit the document details