STAT Principal Components Analysis

Usingprincipalcomponentstosummarize samplevariation

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒ 0 1 2 3 Number SAS output for Factor Analysis The FACTOR Procedure Initial Factor Method: Principal Components Factor Pattern Factor1 x1 x2 x3 Random Variable 1 Random Variable 2 Random Variable 3 0.74824 0.94385 0.84285 Variance Explained by Each Factor Factor Factor1 Weighted Unweighted 13.2193960 Pearson Correlation Coefficients for the first principal component with the three original variables X1, X2, and X3 2.16112149 First eigenvalue λ 1 Final Communality Estimates and Variable Weights Total Communality: Weighted = 13.219396 Unweighted = 2.161121 Variable x1 x2 x3 Communality Weight 0.55986257 0.89085847 0.71040045 2.00000000 8.00000000 7.00000000 Covariance matrices with special structures yield particularly interesting principal components: ­ Diagonal covariance matrices – suppose Σ is the diagonal ~ matrix σ11 0 L 0 0σ 22 L0 Σ= M M O M % 0 0σ L pp since the eigenvector ei has a value of 1 in the ith position and ~ 0 in all other positions, we have 0 so (σii,ei) is the ith M σ11 0 L 0 eigenvalue-eigenvecotr 0σ 0 pair 22 L0 Σei = M %% 0 MO 0σ L M pp σii = σiiei % 0 M 0 …so the linear combination ' Yk =Σe X = X i %% % i demonstrates that the set of principal components and the original set of (uncorrelated) random variables are the same! Note that this result is also true if we work with the correlation matrix. ­ constant variances and covariance matrices – suppose Σ is ~ the patterned matrix σ2 ρσ2 L ρσ2 2 2 2 ρσ σ L ρσ Σ= M M O M % 2 2 2 ρσ ρσ L σ Here the resulting correlation matrix 1ρ ρ 1 ρ= M M % ρ ρ Lρ L ρ O M L 1 is also the covariance matrix of the standardized variables Z Here the resulting correlation matrix ~ C. Using Principal Components to Summarize Sample Variation Suppose the data x1~ ~ represent n independent ,…,xn observations from a p­dimensional population with some mean vector µ and covariance matrix Σ – these data yield a ~ _ ~ sample mean vector x, sample covariance matrix S, and ~ sample correlation matrix R. ~ ~ As in the population case, our goal is to develop a new set of p axes (linear combinations of the original p axes) in the directions of greatest variability: y1 = a' x = a11x1 + a12x 2 + L + a1p x p 1 %' % y 2 = a2x = a21x1 + a22x 2 + L + a2p x p %% M y p = a' x = a p1x1 + a p2x 2 + L + a pp x p p Again it is easy to show that the linear combinations a' x = ai1xj1 + ai2xj2 + L + aip xjp i %% a' x have sample means and i %% Var a' x = a' Sai, i = 1, K , p i i %% % %% Cov a' x, a' x = a' Sa k, i, k = 1, K , p i k i %% %% % %% The principal components are those uncorrelated linear combinations y1^,…,yp^ whose variances are as large as possible. ( ( ) ) Thus the first principal component is the linear combination of maximum sample variance, i.e., we wish to solve the nonlinear optimization problem source of nonlinearity max a' Sa1 1 a1 % %% % st a' a1 = 1 1 %% restrict to coefficient vectors of unit length The second principal component is the linear combination of maximum sample variance that is uncorrelated with the first principal component, i.e., we wish to solve the nonlinear optimization problem max a' Sa2 2 a2 % %% % ' st a2a2 = 1 %% a' Sa2 = 0 1 % %% restricts covariance to zero The third principal component is the solution to the nonlinear optimization problem max a' Sa3 3 a3 % %% % st a' a3 = 1 3 %% a' Sa3 = 0 1 %'...
View Full Document

This note was uploaded on 04/08/2014 for the course STAT 4503 taught by Professor Majidmojirsheibani during the Spring '09 term at Carleton CA.

Ask a homework question - tutors are online