{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

sec7_v3

# sec7_v3 - Fall 2006 ORIE474 Section 7 notes Nikolai...

This preview shows pages 1–2. Sign up to view the full content.

Fall 2006 ORIE474: Section 7 notes Nikolai Blizniouk The goal today is to discuss how to do principal components (PC) regression, how to create interactions in SAS EM and SAS Analyst , and how to use SAS Analyst to extract diagnostic information not provided by the Regression node of SAS EM . Setup: we’ll use the BASEBALL data set. Load it as before. In the Input Data Source node, set SALARY to be your response ( target ). Some notation: I p is the p × p identity matrix and 1 n is the length- n column vector of ones. For an arbitrary matrix A , A j will denote the j th column of A , and A ij will denote its ij th entry. Also, Y denotes the vector of responses and is the vector of errors. PC regression Review of SVD and spectral decomposition Let Z be an n × p matrix (assume n p ). The singular value decomposition (SVD) of Z is given by the equation Z = USV T , where U is of size n × p , S and V are of size p × p . Furthermore, S is diagonal 1 with S ii S jj 0 if i < j , V V T = V T V = I p and U T U = I p . Notice that Z T Z = V SU T USV T = V ( SS ) V T , which implies that λ i = S 2 ii is the i th largest eigenvalue of Z T Z and V i is the corresponding eigenvector. How does this relate to PCA? Recall that in PCA using the sample correlation matrix, we were looking for eigen- values and eigenvectors of the matrix Z T Z , where Z ji = ( X ji - ave ( X i )) /std ( X i ). 2 After that, one would do a change of variables z V T z , thereby decomposing the variation in the original variables into orthogonal directions. In PC regression, the idea is similar: instead of working with the model (1), we work with the equivalent model (3) Y = β 0 · 1 n + + , (1) = β 0 · 1 n + ( US )( V T β ) + (2) = β 0 · 1 n + + , where P = US, γ = V T β. (3) In the model (3), one can set γ q +1 , . . . , γ p to zero, which is equivalent to regressing Y on the first q principal components of Z , which are first q columns of P (plus an intercept, of course). The interpretation is the same: in presence of multicollinearity, variation of P q +1 , . . . , P p is small, and thus these components can be ignored. (The

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 4

sec7_v3 - Fall 2006 ORIE474 Section 7 notes Nikolai...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online