This preview shows pages 1–14. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Multicollinearity STAT 563 Spring 2007 Recap When the predictors are correlated or express nearlinear dependencies, we face the problem of multicollinearity The primary sources of multicollinearity The data collection method employed Constraints on the model or in the population Model specification An overdefined model Effects of multicollinearity Strong multicollinearity results in large variances and covariances for the least squares estimates Recall under unit length scaling, the diagonal elements of the C=(XX)1 can be written as Where R j 2 is the coefficient of determination from the regression of X j on the remaining p1 predictors C jj is also called the Variance Inflation Factors (VIF) p j R C j jj ,...., 2 , 1 , 1 1 2 = = VIF To note the role of VIF in estimation, write the squared distance from to the true parameter vector as Taking the expected value ) ( )' ( 2 1  = L = = = = = = = = p j j p j j j p j j VIF X X Tr Var E E L E 1 2 1 2 1 2 1 2 1 ) ' ( ) ( ) ( ) ( )' ( ) ( Note Note that E(L 1 2 ), which is p 2 for orthogonal system of predictors, is greatly magnified by large VIFs Because trace of a matrix is also the sum of the eigenvalues, we can write Where j >0, j=1,2,..,p are the eigenvalues of XX. If multicollinearity exists, then some j will be smaller making the expectation larger. = = p j j L E 1 2 2 1 1 ) ( Indicators of collinearity Rule of thumb: Simple correlation r ij > 0.95 Variance inflation factors, VIF > 10 Body fat data Recall the VIFs were (708.8, 564.3, 104.6) associated with the predictors SKIN, THIGH and ARM That is, R 2 of (0.999, 0.998, 0.990) for regressing each of the predictors on the remaining two predictors Body fat data More on eigenvalues Recall that, if A is a pxp symmetric matrix with n eigen values 1 , 2 , , p and p associated eigen vectors v 1 ,v 2 ,.,v p , then the following algebraic relationship is true Av j = j v j , j=1,2,.,p Suppose an eigenvalue of WW, while not exactly zero, is very close to zero Since all the elements of a eigen vector are less than 1.0 in magnitude, j v j for small eigen values Eigenvalues It now follows that WWv j = j v j for any eigen vector whose corresponding eigen value is nearly zero Premultiply the above equation by v j and get v j WWv j 0. Now let U =Xv j making the above equation U U =0= U r 2 Eigenvalues Since all terms in the summation are nonnegative, U U =0 implies U =0 . Thus if j 0 (near collinearity), then Implication from the above equation is that the elements of eigen vectors corresponding to small eigen values provide coefficients that define multicollinearities among the predictor variables The larger elements in the eigen vector will indicator which variables are involved in the collinearity 1 = = r p r rj j W v v W Fat Data smallest Eigenvector corresponding to the smallest eigenvalue Fat Data Fat data...
View Full
Document
 Spring '07
 Unknown

Click to edit the document details