sec7_v3 - Fall 2006 ORIE474: Section 7 notes Nikolai...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Fall 2006 ORIE474: Section 7 notes Nikolai Blizniouk The goal today is to discuss how to do principal components (PC) regression, how to create interactions in SAS EM and SAS Analyst , and how to use SAS Analyst to extract diagnostic information not provided by the Regression node of SAS EM . Setup: well use the BASEBALL data set. Load it as before. In the Input Data Source node, set SALARY to be your response ( target ). Some notation: I p is the p p identity matrix and 1 n is the length- n column vector of ones. For an arbitrary matrix A , A j will denote the j th column of A , and A ij will denote its ij th entry. Also, Y denotes the vector of responses and is the vector of errors. PC regression Review of SVD and spectral decomposition Let Z be an n p matrix (assume n p ). The singular value decomposition (SVD) of Z is given by the equation Z = USV T , where U is of size n p , S and V are of size p p . Furthermore, S is diagonal 1 with S ii S jj 0 if i < j , V V T = V T V = I p and U T U = I p . Notice that Z T Z = V SU T USV T = V ( SS ) V T , which implies that i = S 2 ii is the i th largest eigenvalue of Z T Z and V i is the corresponding eigenvector. How does this relate to PCA? Recall that in PCA using the sample correlation matrix, we were looking for eigen- values and eigenvectors of the matrix Z T Z , where Z ji = ( X ji- ave ( X i )) /std ( X i ). 2 After that, one would do a change of variables z 7 V T z , thereby decomposing the variation in the original variables into orthogonal directions. In PC regression, the idea is similar: instead of working with the model (1), we work with the equivalent model (3) Y = 1 n + Z + , (1) = 1 n + ( US )( V T ) + (2) = 1 n + P + , where P = US, = V T . (3) In the model (3), one can set q +1 , . . . , p to zero, which is equivalent to regressing Y on the first q principal components of Z , which are first q columns of P (plus an intercept, of course). The interpretation is the same: in presence of multicollinearity,intercept, of course)....
View Full Document

This note was uploaded on 02/06/2011 for the course ORIE 474 taught by Professor Apanasovich during the Spring '07 term at Cornell University (Engineering School).

Page1 / 4

sec7_v3 - Fall 2006 ORIE474: Section 7 notes Nikolai...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online