This preview shows page 1. Sign up to view the full content.
Unformatted text preview: component
If a large proportion of the total population variance can be attributed to relatively few principal components, we can replace the original p variables with these principal components without loss of much information! Example: Suppose we have the following population of four observations made on three random variables X1, X2, and X3: X1
1.0
4.0
3.0
4.0 X2
6.0
12.0
12.0
10.0 X3
9.0
10.0
15.0
12.0 Find the three population principal components variables Y1, Y2, and Y3 for the standardized random variables Z1, Z2, and Z3: We could standardize the variables X1, X2, and X3, then work with the resulting covariance matrix Σ , but it is much ~
easier to proceed directly with correlation matrix ρ : ~ 1.000 0.833 0.356 ρ = 0.833 1.000 0.624 % 0.356 0.624 1.000 and the corresponding eigenvalueeigenvector pairs: 0.58437383 λ1 = 2.2149347, e1 = 0.63457754
0.50578527 0.5449250 λ2 = 0.6226418, e2 = 0.1549791 0.8240377 λ3 0.6013018 = 0.1624235, e3 = 0.7571610 0.2552315 These results differ
from the covariancebased principal
components! so the principal components are: Y1 = e' Z = 0.5843738Z1 + 0.6345775Z2 + 0.5057853Z3
1
%' %
Y2 = e2Z = 0.5449250Z1  0.1549791Z2 + 0.8240377Z3
%' %
Y3 = e3Z = 0.6013018Z1  0.7571610Z2 + 0.2552315Z3
%%
Note that σ11 + σ22 + σ33 = 1.0 + 1.0 + 1.0 = 3.0
= 2.2149347 + 0.6226418 + 0.1624235 =λ 1+ λ 2+ λ 3 and the proportion of total population variance due to the each principal component is λ1 2.2149347
=
= 0.738311567
3.0 p ∑λ i i=1 λ2 0.6226418
=
= 0.207547267
3 p ∑λ i i=1 λ3 0.1624235
=
= 0.054141167
3 p ∑λ i i=1 Note that the third principal component is again relatively irrelevant! Next we obtain the correlations between the original random variables Xi and the principal components Yi: ρY1,Z1 = e11 λ1 = 0.58437383 2.2149347 = 0.869703464
ρY1,Z2 = e21 λ1 = 0.6345775 2.2149347 = 0.944419907
ρY1,Z3 = e31 λ1 = 0.5057853 2.2149347 = 0.752742749
ρY2,Z1 = e12 λ 2 = 0.5449250 0.6226418 = 0.429987538
ρY2,Z2 = e22 λ 2 = 0.1549791 0.6226418 = 0.122290294 ρY2,X3 = e32 λ2 = 0.8240377 0.6226418 = 0.650228824
ρY3,X1 = e13 λ3 = 0.6013018 0.1624235 = 0.242335443
ρY3,X2 = e23 λ3 = 0.7571610 0.1624235 = 0.305149504
ρY3,X3 = e33 λ3 = 0.2552315 0.1624235 = 0.102862886 We can display these results in a correlation matrix: Y1
Y2
Y3 Z1
0.8697035
0.4299875
0.2423354 Z2
Z3
0.944420 0.7527427
0.122290 0.6502288
0.305150 0.1028629 Here we can easily see that the first principal component (Y1) is a mixture of all three random variables (X1, X2, and X3) the second principal component (Y2) is a tradeoff between X1 and X3 the third principal component (Y3) is a tradeoff between X1 SAS code for Principal Components Analysis:
OPTIONS LINESIZE=72 NODATE PAGENO=1;
DATA stuff;
INPUT x1 x2 x3;
LABEL x1='Random Variable 1'
x2='Random Variable 2'
x3='Random Variable 3';
CARDS;
1.0 6.0 9.0
4.0 12.0 10.0
3.0 12.0 15.0
4.0 10.0 12.0
;
PROC PRINCOMP DATA=stuff OUT=pcstuff N=3;
VAR x1 x2 x3;
RUN;
PROC...
View
Full
Document
This note was uploaded on 04/08/2014 for the course STAT 4503 taught by Professor Majidmojirsheibani during the Spring '09 term at Carleton CA.
 Spring '09
 MAJIDMOJIRSHEIBANI
 Covariance, Variance

Click to edit the document details