STAT Principal Components Analysis

# Sotheprincipalcomponentsare y1 e z 05843738z1

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: component If a large proportion of the total population variance can be attributed to relatively few principal components, we can replace the original p variables with these principal components without loss of much information! Example: Suppose we have the following population of four observations made on three random variables X1, X2, and X3: X1 1.0 4.0 3.0 4.0 X2 6.0 12.0 12.0 10.0 X3 9.0 10.0 15.0 12.0 Find the three population principal components variables Y1, Y2, and Y3 for the standardized random variables Z1, Z2, and Z3: We could standardize the variables X1, X2, and X3, then work with the resulting covariance matrix Σ , but it is much ~ easier to proceed directly with correlation matrix ρ : ~ 1.000 0.833 0.356 ρ = 0.833 1.000 0.624 % 0.356 0.624 1.000 and the corresponding eigenvalue­eigenvector pairs: 0.58437383 λ1 = 2.2149347, e1 = 0.63457754 0.50578527 -0.5449250 λ2 = 0.6226418, e2 = -0.1549791 0.8240377 λ3 0.6013018 = 0.1624235, e3 = -0.7571610 0.2552315 These results differ from the covariancebased principal components! so the principal components are: Y1 = e' Z = 0.5843738Z1 + 0.6345775Z2 + 0.5057853Z3 1 %' % Y2 = e2Z = -0.5449250Z1 - 0.1549791Z2 + 0.8240377Z3 %' % Y3 = e3Z = 0.6013018Z1 - 0.7571610Z2 + 0.2552315Z3 %% Note that σ11 + σ22 + σ33 = 1.0 + 1.0 + 1.0 = 3.0 = 2.2149347 + 0.6226418 + 0.1624235 =λ 1+ λ 2+ λ 3 and the proportion of total population variance due to the each principal component is λ1 2.2149347 = = 0.738311567 3.0 p ∑λ i i=1 λ2 0.6226418 = = 0.207547267 3 p ∑λ i i=1 λ3 0.1624235 = = 0.054141167 3 p ∑λ i i=1 Note that the third principal component is again relatively irrelevant! Next we obtain the correlations between the original random variables Xi and the principal components Yi: ρY1,Z1 = e11 λ1 = 0.58437383 2.2149347 = 0.869703464 ρY1,Z2 = e21 λ1 = 0.6345775 2.2149347 = 0.944419907 ρY1,Z3 = e31 λ1 = 0.5057853 2.2149347 = 0.752742749 ρY2,Z1 = e12 λ 2 = -0.5449250 0.6226418 = -0.429987538 ρY2,Z2 = e22 λ 2 = -0.1549791 0.6226418 = -0.122290294 ρY2,X3 = e32 λ2 = 0.8240377 0.6226418 = 0.650228824 ρY3,X1 = e13 λ3 = 0.6013018 0.1624235 = 0.242335443 ρY3,X2 = e23 λ3 = -0.7571610 0.1624235 = -0.305149504 ρY3,X3 = e33 λ3 = 0.2552315 0.1624235 = 0.102862886 We can display these results in a correlation matrix: Y1 Y2 Y3 Z1 0.8697035 -0.4299875 0.2423354 Z2 Z3 0.944420 0.7527427 -0.122290 0.6502288 -0.305150 0.1028629 Here we can easily see that ­ the first principal component (Y1) is a mixture of all three random variables (X1, X2, and X3) ­ the second principal component (Y2) is a trade­off between X1 and X3 ­ the third principal component (Y3) is a trade­off between X1 SAS code for Principal Components Analysis: OPTIONS LINESIZE=72 NODATE PAGENO=1; DATA stuff; INPUT x1 x2 x3; LABEL x1='Random Variable 1' x2='Random Variable 2' x3='Random Variable 3'; CARDS; 1.0 6.0 9.0 4.0 12.0 10.0 3.0 12.0 15.0 4.0 10.0 12.0 ; PROC PRINCOMP DATA=stuff OUT=pcstuff N=3; VAR x1 x2 x3; RUN; PROC...
View Full Document

## This note was uploaded on 04/08/2014 for the course STAT 4503 taught by Professor Majidmojirsheibani during the Spring '09 term at Carleton CA.

Ask a homework question - tutors are online