lecture4 - Lecture 5: Multiple Linear Regression Nancy R....

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lecture 5: Multiple Linear Regression Nancy R. Zhang Statistics 203, Stanford University January 19, 2010 Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 1 / 25 Agenda Today: multiple linear regression. This week: comparing nested models in multiple linear regression. Finish diagnostics slides next lecture. Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 2 / 25 How does land use affect river pollution? Nitrogen 0 1 Agr 2 Forest Lecture 5 3 Rsdntial 4 ComIndl error 3 / 25 Nancy R. Zhang (Statistics 203) January 19, 2010 Multiple Linear Regression Design matrix: 1 X11 X21 1 X12 X22 . .. . . . 1 X1n X2n Xp1 Xp2 Xpn y1 y2 . . . yn X . . . y Squared error loss function: n p 2 j Xij j 1 L i 1 yi 0 In matrix notation: L Nancy R. Zhang (Statistics 203) y X Lecture 5 y X January 19, 2010 4 / 25 Linear Subspaces and Projections With p predictors we have p 1 vectors in Xi1 Xi2 . . . Xin From now on we will always let X0 be the vector of ones. We denote by X0 Xp : X0 X0 Xp the linear space spanned by the vectors p n: Xi i 0 p Xp i 0 n. ai Xi a0 ap p 1 This is a linear subspace of Nancy R. Zhang (Statistics 203) We use the shorthand Lecture 5 X . January 19, 2010 5 / 25 The dimension of matrix X X0 Xp is equivalent to the rank of the X01 X11 X02 X12 . . . X0n X1n Xp 1 Xp 2 . . . Xp n .. . The rank of a matrix is equal to the number of linearly independent columns. The linear map that projects any vector v obtained by PX X X X 1 X . n onto X can be Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 6 / 25 Projection Matrices Thus, for any n p 1 matrix X , we can construct a projection matrix PX X X X 1 X that projects vectors onto the column space of X . Projection matrices enjoy some special properties: 1 2 3 2 PX PX . rank X . X , PX v X, rank PX For any v v. For any linear space its null space is the set v n X . The projection matrix onto Xv 0 X is I PX . Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 7 / 25 Linear Regression by Least Squares = Projection Simple linear regression with intercept: Project y1 y2 . . . yn onto 1 1 . . . 1 x1 x2 . . . xn January 19, 2010 8 / 25 Nancy R. Zhang (Statistics 203) Lecture 5 Multiple Linear Regression The solution XX 1 XY can also be obtained directly from the concept of a projection: y X X XX XX . 1 1 Xy Xy Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 9 / 25 Calculating variances Multivariate Gaussians Let Z N , and a a n, B an n N a n matrix, then B B B BZ XX E Var 1 Xy y N X 2 I 2 XX 1 Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 10 / 25 Calculating variance Multivariate Gaussians Let Z N , and a a n, B an n N a n matrix, then B B B BZ y E y Var y PX y X 2 PX The diagonal of PX are the leverage values from last lecture. Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 11 / 25 t-tests for i N As before, estimate 2 2 XX 1 using 2 SSE n p 1 Then, we can construct t-test by: t i i se i i As before, reject the hypothesis Hi 0 t i 0 at level 2 if t n p 1 Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 12 / 25 Interpreting the Y i 's 0 1 X1 2 X2 p Xp The i obtained from a multiple regression is sometimes called partial regression coefficients because they correspond to a simple regression of Y on Xi , after taking out the effects of Xj j i. 1 2 3 Regress Xi on Xj j Regress Yi on Xj j i , get residuals r i i , get residuals r Y rY on ri, Xi Y Xi . Y i. i. Do simple linear regression of the slope will give you This gives us: Var i 2 ri 2 High correlation among the X 's can "mask" out each other's effects. Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 13 / 25 Goodness of fit Sums of squares n n SSE i 1 n Yi Y i 1 n Yi Yi Y 1 2 i 1 n 2 i 1 Yi Y 0 1 Xi 2 SSR 0 1 Xi 2 SST i 1 Yi SSR SST 2 SSE SSR R2 SSE SST R R 2 is called the multiple correlation coefficient. 2 is large: a lot of the variability in Y is explained by X . R Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 14 / 25 Large R 2 may not indicate a good model. Hypothetical scenario: n observations, n linearly independent covariates. What would you get for R 2 ? As you add predictors to the model, R 2 will always increase, no matter what those predictors are! Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 15 / 25 "Goodness of Fit" Measures R provides the following measures: R2 2 Ra SSR SST 1 1 SSE SST n n p 1 1 1 R2 1 SSE n p 1 SST n 1 1 R 2 is easy to interpret. It is the proportion of the "variation" in the data explained by the model. 2 R 2 does not adjust for the model size, while Ra does. When 2. comparing models of different sizes, use Ra 2 3 However, for hypothesis testing the F statistic should be used. Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 16 / 25 F-tests for R 2 Assume model has intercept (design matrix has p columns). F SSR p 1 SSE n p 1 F-distribution If W 2 q is independent of Z W q Z r 2, r then Fq r Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 17 / 25 F-Table Source Regression Residuals Sum of Squares SSR SSE n d.f. p 1 p 1 Mean Square SSR MSR p 1 MSE SSE n p 1 F F MSR MSE Reject at level if F F p 1 n 1 p 2 1 . p This tests the hypothesis H0 0. Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 18 / 25 Nested models Test the hypothesis that a subset of H0 That is, we have the model RM nested within FM Y 1 X1 p Xp 1 2 i 's are zero: r 0 Y r 1 Xr 1 p Xp error error . Does X1 Xr have a significant marginal effect, after adjusting for the other predictors? Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 19 / 25 Nested models RM FM . Y Y r 1 Xr 1 p Xp error error 1 X1 p Xp df F df FM df RM df SSE RM SSE FM SSE FM n df FM F F df n df FM Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 20 / 25 Testing Constraints In some situations you want to test that your model parameters satisfy some constraint. Say you have model: Y and want to test: H0 This is equivalent to the model: Y 0 1 1 2 0 1 X1 2 X2 error X1 X2 df error 1. . Fit these two models, apply F test with Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 21 / 25 Testing Constraints Another example Another example: Y and want to test: H0 This is equivalent to the model: Y 0 1 X1 1 2 0 1 X1 2 X2 error 1 1 1 X2 error . which can be simplified to: Y X2 0 1 X1 X2 error Fit these two models, apply F test with df 1. Usually df equals the number of constraints. Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 22 / 25 Influence of Outliers How much influence does the data point have on the model fit? Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 23 / 25 Different measures of Influence 1 How much influence does observation i have on its own fit? DFFITS yi i yi i MSE i hii DFFITS exceeding 2 p n is considered large. 2 How much influence does observation i have on the fitted 's? DFBETAS 1 i 1 i MSE i c111 n is considered where c11 large. i xi x 2 . DFBETA exceeding 2 These conventional rules work for "reasonably sized" data sets. Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 24 / 25 Different measures of Influence: Cook's Distance 1 Cook's distance is defined as: Di n j 1 yj yj i 2 p 1 MSE 2 Considers the influence of Yi on all of the fitted values, not just the i-th case. It can be shown that Di is equivalent to ri2 p 11 hii hii 3 4 Compare Di to the Fp 1n p 1 distribution. Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 25 / 25 ...
View Full Document

This note was uploaded on 02/27/2012 for the course STATS 315A taught by Professor Tibshirani,r during the Spring '10 term at Stanford.

Ask a homework question - tutors are online