Principal_Component_Analysis

# Principal_Component_Analysis - Principal Component...

This preview shows pages 1–7. Sign up to view the full content.

30 Principal Component Analysis: Linear reduction technique. Example (Johnson and Wichern) : Weekly return of five stocks (Allied Chemical, du Pont, Union Carbide, Exxon, Texaco). Let 5 2 1 ,...... , x x x denote observed weekly rates. [] 0037 . 0063 . 0057 . 0048 . 0054 . ' = x = Σ 1 523 . 426 . 322 . 462 . 523 . 1 436 . 389 . 387 . 426 . 436 . 1 599 . 509 . 322 . 389 . 599 . 1 577 . 462 . 387 . 509 . 577 . 1 ˆ 857 . 2 ˆ 1 = λ [] 421 . 421 . 470 . 457 . 464 . ˆ ' 1 = e 809 . ˆ 2 = [] 582 . 526 . 260 . 509 . 240 . ˆ ' 2 = e 540 . ˆ 3 = [] 435 . 541 . 335 . 178 . 612 . ˆ ' 3 = e 452 . ˆ 4 = [] 382 . 472 . 662 . 206 . 387 . ˆ ' 4 = e 343 . ˆ 5 = [] 385 . 176 . 400 . 676 . 451 . ˆ ' 5 = e So, Algebraically, PC’s are particular linear combinations of the p random variables. Geometrically, these linear combinations represent the selection of a new coordinate system that represents the directions of maximum variability.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
31 Reasons for using PCA: a) Data screening. b) Clustering. c) Discriminant analysis. d) Regresion. Objectives of PCA. 1) Data reduction. 2) Interpretation. Definition Let the random vector [] P x x x X L 2 1 = have the covariance matrix Σ with eigenvalues 0 2 1 p λ K p p x a x a x a x a Y 1 2 12 1 11 ' 1 1 + + + = = L
32 p Y k i k i i i i a a Y Y Cov a a Y Var Σ = Σ = ' ' ) ( ) ( The PC’s are those uncorrelated linear combinations ) , , ( 1 p Y Y L whose variances are as large as possible. The first PC is the linear combination with maximum variance. Each succeeding PC accounts for as much of the remaining variability as possible Also, = = = Λ = Σ = p i i p i i Y Tr Tr X 1 1 ) var( ) ( ) ( ) var( Principal component score: Vector loading and Component loading vectors: Eigenvectors are normalized to unit length. Estimation of PC’s: µ ˆ Σ ˆ λ ˆ a ˆ e ˆ Determining the number of PC’s:

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
33 Proportion Scree plots Correlation between original variable measurements and pc scores.
34 Options ps=50 ls=74 pageno=1 nodate; Title 'Cereal First Example'; Data Cereal_1; Infile "C:\CPSC MULTI\Data Sets\Cereal_1.csv" dlm="," firstobs=2; Input ID calories protein fat sodium; drop ID; cards; proc print;run; proc princomp data=Cereal_1 covariance out=princereal1 ; proc print data=princereal1; run; Cereal First Example 1 The PRINCOMP Procedure Observations 6 Variables 4 Simple Statistics calories protein fat sodium Mean 88.33333333 3.166666667 1.833333333 154.1666667 StD 28.57738033 0.983192080 1.722401424 82.6085145 Covariance Matrix calories protein fat sodium calories 816.666667 -23.666667 41.666667 -761.666667 protein -23.666667 0.966667 -0.766667 -0.833333 fat 41.666667 -0.766667 2.966667 -94.166667 sodium -761.666667 -0.833333 -94.166667 6824.166667 Total Variance 7644.7666667

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
35 Eigenvalues of the Covariance Matrix Eigenvalue Difference Proportion Cumulative 1 6920.63710
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 12/26/2010 for the course CPSC 499 taught by Professor Staff during the Spring '08 term at University of Illinois, Urbana Champaign.

### Page1 / 38

Principal_Component_Analysis - Principal Component...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online