This preview shows page 1. Sign up to view the full content.
Unformatted text preview: % %
a2Sa3 = 0 restricts
covariances
to zero Generally, the ith principal component is the linear combination of maximum sample variance that is uncorrelated with all previous principal components, i.e., we wish to solve the nonlinear optimization problem max a' Sai
i
ai
% %%
%
'
st aiai = 1
%' %
akSai = 0 ∀k < i
% %% We can show that, for random sample X with sample ~
^ ^ ^
covariance matrix S and eigenvalues λ 1 ≥ λ 2 ≥ L ≥ λ p ≥ 0, ~
the ith sample principal component is given by ˆi
ˆi1
ˆi2
ˆip
ˆi = e' x = e' x1 + e' x 2 + L + e' x p, i = 1, K , p
y
%%
Note that the principal components are not unique if some eigenvalues are equal. We can also show for random sample X with sample ~
^
covariance matrix S and eigenvalueeigenvector pairs (λ 1 , e1), ^ ^
~ ^ ^ ^
^
…, (λ p , ep) where λ 1 ≥ ~ ~ ~
λ 2 ≥ L ≥ λ p,
~
~ p s11 + L + spp = ∑ sii =λˆ1+ L+ λˆ p = i =1 p (
∑Var y )
i i =1 so we can assess how well a subset of the principal components yi summarizes the original random sample X – ~
one common method of doing so is
ˆ
proportion of total
λk
sample variance due
p
ˆ
to the kth principal
λi
i=1
component ∑ If a large proportion of the total sample variance can be attributed to relatively few principal components, we can replace the original p variables with these principal components without loss of much information! We can also easily find the correlations between the original random variables xk and the principal components yi rYi,X k = ˆik ˆi
eλ
skk These values are often used in interpreting the principal components yi. Note that the approach for standardized data (i.e., principal components derived from the sample correlation matrix R) is analogous to the population approach
~ when principal components are derived from sample data, the sample data are frequently centered, xx
%% which has no effect on the sample covariance matrix S and ~
yields the derived principal components ˆi
ˆi = e' ( x  x )
y
%%% Under these circumstances, the mean value of the ith principal component associated with all n observations in the data set is 1
ˆi =
y
n n 1 'n
1
ˆ' xj  x = ei ∑ xj  x = e' 0 = 0
ˆ
∑ ei % % n % j=1 % % n ˆi %
%
j=1 % ( ) ( ) Example: Suppose we have the following sample of four observations made on three random variables X1, X2, and X3: X1
1.0
4.0
3.0
4.0 X2
6.0
12.0
12.0
10.0 X3
9.0
10.0
15.0
12.0 Find the three sample principal components y1, y2, and y3 based on the sample covariance matrix S: ~ First we need the sample covariance matrix S: 2.00 3.33 1.33 S = 3.33 8.00 4.67
% 1.33 4.67 7.00 ~ and the corresponding eigenvalueeigenvector pairs: 0.291000 ˆ
ˆ
λ1 = 13.21944, e1 = 0.734253 0.613345 0.415126 ˆ
λ2 = 3.37916, ˆ2 = 0.480690
e
0.772403 0.861968 ˆ
λ3 = 0.40140, ˆ3 = 0.479385
e 0.164927 so the principal components are: ˆ1 = e' x = 0.291000x1 + 0.734253x 2 + 0.613345x3
y
1
%'%
ˆ2 = e2x = 0.415126x1 + 0.480690x 2  0.772403x3
y
%' %
ˆ3 = e3x = 0.861968x1  0.479385x2 + 0.164927x3
y
%%
Note that s11 + s22 + s33 = 2.0 + 8.0 + 7.0 = 17.0
= 13.21944 + 3.37916 + 0.40140 =λˆ1 +λˆ2 + λˆ3 and the proportion of total population variance due to the each principal component is ˆ
λ1
p ∑ ˆ
λi 13.21944
=
= 0.777613814
17.0 i=1 ˆ
λ2 p ∑ ˆ
λi 3.37916
=
= 0.198774404
17.0 i=1 ˆ
λ3 p ∑ ˆ
λi 0.40140
=
= 0.023611782
17.0 i=1 Note that the third principal component is relatively irrelevant! Next we obtain the correlations between the observed values xi of the original random variables and the sample principal components yik ry1,x1 =
ry1,x2 =
ry1,x3 =
ry2,x1 =
ry2,x2 = ˆ11 ˆ1
eλ
s11
ˆλ ˆ
e
21 1 s22
ˆλ ˆ
e
31 1 s33
ˆ12 ˆ2
eλ
s11
ˆλ ˆ 2
e22
s21 0.291000 13.21944
=
= 0.529016407
2.0
0.734253 13.21944
=
= 0.333704415
8.0
0.613345 13.21944
=
= 0.318576185
7.0
0.415126 3.37916
=
= 0.381552972
2.0
0.480690 3.37916
=
= 0.110453671
8.0 ry2,x3 =
ry3,x1 =
ry3,x2 =
ry3,x3 = ˆ32 ˆ2
eλ
s33
ˆλ ˆ
e
13 3 s11
ˆλ ˆ
e
23 s22
ˆλ ˆ
e
33 s33 3 3 0.772403 3.37916
=
= 0.202838600
7.0
0.861968 0.40140
=
= 0.273055007
2.0
0.479385 0.40140
=
= 0.037964991
8.0
0.164927 0.40140
=
= 0.014927318
7.0 We can display these results in a corr...
View
Full
Document
This note was uploaded on 04/08/2014 for the course STAT 4503 taught by Professor Majidmojirsheibani during the Spring '09 term at Carleton CA.
 Spring '09
 MAJIDMOJIRSHEIBANI
 Covariance, Variance

Click to edit the document details