STAT Principal Components Analysis

# STAT Principal Components Analysis - VI....

This preview shows pages 1–8. Sign up to view the full content.

VI. Principal Components Analysis A. The Basic Principle We wish to explain/summarize the underlying variance- covariance structure of a large set of variables through a few  linear combinations of these variables. The objectives of  principal components analysis are    - data reduction    - interpretation The results of principal components analysis are often used  as inputs to    - regression analysis    - cluster analysis

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is accomplished by rotating the axes.   B. Population Principal Components Suppose we have a population measured on p random  variables X 1 ,…,X p . Note that these random variables  represent the p-axes of the Cartesian coordinate system in  which the population resides. Our goal is to develop a new set  of p axes (linear combinations of the original p axes) in the  directions of greatest variability: X 1 X 2
              M % 1 2 p X X X = X Consider our random vector L % % L % % M L % % ' 1 1 11 1 12 2 1p p ' 2 2 21 1 22 2 2p p ' p p p1 1 p2 2 pp p Y = a X = a X + a X + + a X Y = a X = a X + a X + + a X Y = a X = a X + a X + + a X with covariance matrix  Σ  and eigenvalues  λ 1     λ 2     L     λ p . We can construct p linear combinations ~

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
1 ' 1 1 a ' 1 1 max aΣa st a a = 1 % % % % % % ( 29 ( 29 K % % % K % % % ' i i i ' i k i k Var Y = aΣa , i = 1, , p Cov Y, Y = aΣa , i, k = 1, , p It is easy to show that The  principal components  are those  uncorrelated  linear  combinations Y 1 ,…,Y p  whose variances are as large as  possible. Thus the first principal component is the linear combination  of maximum variance, i.e., we wish to solve the nonlinear  optimization problem source of nonlinearity restrict to coefficient vectors of unit length
2 ' 2 2 a ' 2 2 ' 1 2 max aΣa st a a = 1 aΣa = 0 % % % % % % % % % The second principal component is the linear combination of  maximum variance that is uncorrelated with the first  principal component, i.e., we wish to solve the nonlinear  optimization problem restricts covariance to zero 3 ' 3 3 a ' 3 3 ' 1 3 ' 2 3 max aΣa a a = 1 aΣa = 0 aΣa = 0 % % % % % % % % % The third principal component is the solution to the nonlinear  optimization problem restricts covariances to zero

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
i ' i i a ' i i ' k i max aΣa st a a = 1 aΣa = 0 k < i % % % % % % % % % 2200 Generally, the i th  principal component is the linear  combination of maximum variance that is uncorrelated with  all previous principal components, i.e., we wish to solve the  nonlinear optimization problem ' ' ' ' i i i1 1 i2 2 ip p Y = e X = e X + e X + + e X , i = 1, , p L K % % We can show that, for random vector X with covariance  matrix  Σ  and eigenvalues  λ 1     λ 2     L     λ p    0, the i th   principal component is given by ~ ~ Note that the principal components are not unique if some  eigenvalues are equal.
( 29 ( 29 ∑ ∑ p p 11 pp i 1 p i i=1 i=1 σ + + σ = Var X = λ + + λ = Var Y L L

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 68

STAT Principal Components Analysis - VI....

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online