This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Fall 2006 ORIE474: Section 7 notes Nikolai Blizniouk The goal today is to discuss how to do principal components (PC) regression, how to create interactions in SAS EM and SAS Analyst , and how to use SAS Analyst to extract diagnostic information not provided by the Regression node of SAS EM . Setup: well use the BASEBALL data set. Load it as before. In the Input Data Source node, set SALARY to be your response ( target ). Some notation: I p is the p p identity matrix and 1 n is the length n column vector of ones. For an arbitrary matrix A , A j will denote the j th column of A , and A ij will denote its ij th entry. Also, Y denotes the vector of responses and is the vector of errors. PC regression Review of SVD and spectral decomposition Let Z be an n p matrix (assume n p ). The singular value decomposition (SVD) of Z is given by the equation Z = USV T , where U is of size n p , S and V are of size p p . Furthermore, S is diagonal 1 with S ii S jj 0 if i < j , V V T = V T V = I p and U T U = I p . Notice that Z T Z = V SU T USV T = V ( SS ) V T , which implies that i = S 2 ii is the i th largest eigenvalue of Z T Z and V i is the corresponding eigenvector. How does this relate to PCA? Recall that in PCA using the sample correlation matrix, we were looking for eigen values and eigenvectors of the matrix Z T Z , where Z ji = ( X ji ave ( X i )) /std ( X i ). 2 After that, one would do a change of variables z 7 V T z , thereby decomposing the variation in the original variables into orthogonal directions. In PC regression, the idea is similar: instead of working with the model (1), we work with the equivalent model (3) Y = 1 n + Z + , (1) = 1 n + ( US )( V T ) + (2) = 1 n + P + , where P = US, = V T . (3) In the model (3), one can set q +1 , . . . , p to zero, which is equivalent to regressing Y on the first q principal components of Z , which are first q columns of P (plus an intercept, of course). The interpretation is the same: in presence of multicollinearity,intercept, of course)....
View
Full
Document
This note was uploaded on 02/06/2011 for the course ORIE 474 taught by Professor Apanasovich during the Spring '07 term at Cornell University (Engineering School).
 Spring '07
 APANASOVICH

Click to edit the document details