STAT Principal Components Analysis

# Examplesupposewehavethefollowingpopulationoffour

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ∑ If a large proportion of the total population variance can be attributed to relatively few principal components, we can replace the original p variables with these principal components without loss of much information! We can also easily find the correlations between the original random variables Xk and the principal components Yi: ρYi,Xk = eik λ i σkk These values are often used in interpreting the principal components Yi. Example: Suppose we have the following population of four observations made on three random variables X1, X2, and X3: X1 1.0 4.0 3.0 4.0 X2 6.0 12.0 12.0 10.0 X3 9.0 10.0 15.0 12.0 Find the three population principal components Y1, Y2, and Y3: First we need the covariance matrix Σ : ~ 1.50 2.50 1.00 Σ = 2.50 6.00 3.50 % 1.00 3.50 5.25 and the corresponding eigenvalue­eigenvector pairs: 0.2910381 λ1 = 9.9145474, e1 = 0.7342493 0.6133309 0.4150386 λ2 = 2.5344988, e2 = 0.4807165 -0.7724340 0.8619976 λ3 = 0.3009542, e3 = -0.4793640 0.1648350 so the principal components are: Y1 = e' X = 0.2910381X 1 + 0.7342493X 2 + 0.6133309X 3 1 %'% Y2 = e2X = 0.4150386X 1 + 0.4807165X 2 - 0.7724340X 3 %' % Y3 = e3 X = 0.8619976X 1 - 0.4793640X 2 + 0.1648350X 3 %% Note that σ11 + σ22 + σ33 = 2.0 + 8.0 + 7.0 = 17.0 = 9.9145474 + 2.5344988 + 0.3009542 =λ 1+ λ 2+ λ 3 and the proportion of total population variance due to the each principal component is λ1 9.9145474 = = 0.777611529 17.0 p ∑λ i i=1 λ2 2.5344988 = = 0.198784220 17.0 p ∑λ i i=1 λ3 0.3009542 = = 0.023604251 17.0 p ∑λ i i=1 Note that the third principal component is relatively irrelevant! Next we obtain the correlations between the original random variables Xi and the principal components Yi: ρY1,X1 = ρY1,X2 = ρY1,X3 = ρY2,X1 = ρY2,X2 = e11 λ 1 σ11 e21 λ 1 σ22 e31 λ 1 σ33 e12 λ 2 σ11 eλ 22 σ21 2 0.2910381 9.9145474 = = 0.610935027 1.50 0.7342493 9.9145474 = = 0.385326368 6.00 0.6133309 9.9145474 = = 0.367851033 5.25 0.4150386 2.5344988 = = 0.440497325 1.50 0.4807165 2.5344988 = = 0.127550987 6.00 ρY2,X3 = ρY3,X1 = ρY3,X2 = ρY3,X3 = e32 λ 2 σ33 e13 λ 3 σ11 e23 λ 3 σ22 e33 λ σ33 3 -0.7724340 2.5344988 = = -0.234233023 5.25 0.8619976 0.3009542 = = 0.315257191 1.50 -0.4793640 0.3009542 = = -0.043829283 6.00 0.1648350 0.3009542 = = 0.017224251 5.25 We can display these results in a correlation matrix: Y1 Y2 Y3 X1 X2 X3 0.6109350 0.3853264 0.3678510 0.4404973 0.1275510 -0.2342330 0.3152572 -0.0438293 0.0172243 Here we can easily see that ­ the first principal component (Y1) is a mixture of all three random variables (X1, X2, and X3) ­ the second principal component (Y2) is a trade­off between X1 and X3 ­ the third principal component (Y3) is a residual of X1 When the principal components are derived from an X ~ ~ Np(µ ,Σ ) distributed population, the density of X is constant ~ ~ ~ on the µ ­centered ellipsoids ~ ( )( ' ) x -μ Σ x - μ = c % %%% % 2 which have axes ± cλ , i = 1, K p , i where (λ i,ei) are the eigenvalue­eigenvector pairs of Σ . ~ ~ We can set µ = 0 w.l.g. – we can then write ~ ~ () () 1 '2...
View Full Document

## This note was uploaded on 04/08/2014 for the course STAT 4503 taught by Professor Majidmojirsheibani during the Spring '09 term at Carleton CA.

Ask a homework question - tutors are online