This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Fall 2006 ORIE474: Section 7 notes Nikolai Blizniouk The goal today is to discuss how to do principal components (PC) regression, how to create interactions in SAS EM and SAS Analyst , and how to use SAS Analyst to extract diagnostic information not provided by the Regression node of SAS EM . Setup: well use the BASEBALL data set. Load it as before. In the Input Data Source node, set SALARY to be your response ( target ). Some notation: I p is the p p identity matrix and 1 n is the length- n column vector of ones. For an arbitrary matrix A , A j will denote the j th column of A , and A ij will denote its ij th entry. Also, Y denotes the vector of responses and is the vector of errors. PC regression Review of SVD and spectral decomposition Let Z be an n p matrix (assume n p ). The singular value decomposition (SVD) of Z is given by the equation Z = USV T , where U is of size n p , S and V are of size p p . Furthermore, S is diagonal 1 with S ii S jj 0 if i < j , V V T = V T V = I p and U T U = I p . Notice that Z T Z = V SU T USV T = V ( SS ) V T , which implies that i = S 2 ii is the i th largest eigenvalue of Z T Z and V i is the corresponding eigenvector. How does this relate to PCA? Recall that in PCA using the sample correlation matrix, we were looking for eigen- values and eigenvectors of the matrix Z T Z , where Z ji = ( X ji- ave ( X i )) /std ( X i ). 2 After that, one would do a change of variables z 7 V T z , thereby decomposing the variation in the original variables into orthogonal directions. In PC regression, the idea is similar: instead of working with the model (1), we work with the equivalent model (3) Y = 1 n + Z + , (1) = 1 n + ( US )( V T ) + (2) = 1 n + P + , where P = US, = V T . (3) In the model (3), one can set q +1 , . . . , p to zero, which is equivalent to regressing Y on the first q principal components of Z , which are first q columns of P (plus an intercept, of course). The interpretation is the same: in presence of multicollinearity,intercept, of course)....
View Full Document
This note was uploaded on 02/06/2011 for the course ORIE 474 taught by Professor Apanasovich during the Spring '07 term at Cornell University (Engineering School).
- Spring '07