This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Lecture 5: Multiple Linear Regression
Nancy R. Zhang
Statistics 203, Stanford University January 19, 2010 Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 1 / 25 Agenda Today: multiple linear regression. This week: comparing nested models in multiple linear regression. Finish diagnostics slides next lecture. Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 2 / 25 How does land use affect river pollution? Nitrogen 0 1 Agr 2 Forest
Lecture 5 3 Rsdntial 4 ComIndl error
3 / 25 Nancy R. Zhang (Statistics 203) January 19, 2010 Multiple Linear Regression
Design matrix: 1 X11 X21 1 X12 X22 . .. . . . 1 X1n X2n Xp1 Xp2 Xpn y1 y2 . . . yn X . . . y Squared error loss function:
n p 2 j Xij j 1 L
i 1 yi 0 In matrix notation: L
Nancy R. Zhang (Statistics 203) y X
Lecture 5 y X
January 19, 2010 4 / 25 Linear Subspaces and Projections
With p predictors we have p 1 vectors in Xi1 Xi2 . . . Xin From now on we will always let X0 be the vector of ones. We denote by X0 Xp : X0 X0 Xp the linear space spanned by the vectors
p n: Xi i 0 p Xp
i 0 n. ai Xi a0 ap p 1 This is a linear subspace of
Nancy R. Zhang (Statistics 203) We use the shorthand
Lecture 5 X .
January 19, 2010 5 / 25 The dimension of matrix X X0 Xp is equivalent to the rank of the X01 X11 X02 X12 . . . X0n X1n Xp 1 Xp 2 . . . Xp n .. . The rank of a matrix is equal to the number of linearly independent columns. The linear map that projects any vector v obtained by PX X X X 1 X .
n onto X can be Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 6 / 25 Projection Matrices
Thus, for any n p 1 matrix X , we can construct a projection matrix PX X X X 1 X that projects vectors onto the column space of X . Projection matrices enjoy some special properties:
1 2 3 2 PX PX . rank X . X , PX v
X, rank PX For any v v. For any linear space its null space is the set v
n X . The projection matrix onto Xv 0 X is I PX . Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 7 / 25 Linear Regression by Least Squares = Projection
Simple linear regression with intercept: Project y1 y2 . . . yn onto 1 1 . . . 1 x1 x2 . . . xn
January 19, 2010 8 / 25 Nancy R. Zhang (Statistics 203) Lecture 5 Multiple Linear Regression The solution XX
1 XY can also be obtained directly from the concept of a projection: y X X XX XX .
1 1 Xy Xy Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 9 / 25 Calculating variances
Multivariate Gaussians
Let Z N , and a a
n, B an n N a n matrix, then B B B BZ XX E Var 1 Xy y N X 2 I 2 XX 1 Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 10 / 25 Calculating variance
Multivariate Gaussians
Let Z N , and a a
n, B an n N a n matrix, then B B B BZ y E y Var y PX y X
2 PX The diagonal of PX are the leverage values from last lecture. Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 11 / 25 ttests for i N As before, estimate
2 2 XX 1 using
2 SSE n p 1 Then, we can construct ttest by: t
i
i se i i As before, reject the hypothesis Hi 0 t
i 0 at level 2 if t n p 1 Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 12 / 25 Interpreting the
Y i 's 0 1 X1 2 X2 p Xp The i obtained from a multiple regression is sometimes called partial regression coefficients because they correspond to a simple regression of Y on Xi , after taking out the effects of Xj j i.
1 2 3 Regress Xi on Xj j Regress Yi on Xj j i , get residuals r i i , get residuals r Y rY on ri, Xi Y Xi . Y i.
i. Do simple linear regression of the slope will give you This gives us: Var
i 2 ri 2 High correlation among the X 's can "mask" out each other's effects.
Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 13 / 25 Goodness of fit
Sums of squares
n n SSE
i 1 n Yi Y
i 1 n Yi Yi Y 1 2 i 1 n 2 i 1 Yi Y 0 1 Xi 2 SSR 0 1 Xi 2 SST
i 1 Yi SSR SST 2 SSE SSR R2 SSE SST R R 2 is called the multiple correlation coefficient. 2 is large: a lot of the variability in Y is explained by X . R Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 14 / 25 Large R 2 may not indicate a good model. Hypothetical scenario: n observations, n linearly independent covariates. What would you get for R 2 ? As you add predictors to the model, R 2 will always increase, no matter what those predictors are! Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 15 / 25 "Goodness of Fit" Measures
R provides the following measures: R2
2 Ra SSR SST 1 1 SSE SST n n p 1 1 1 R2 1 SSE n p 1 SST n 1 1 R 2 is easy to interpret. It is the proportion of the "variation" in the data explained by the model.
2 R 2 does not adjust for the model size, while Ra does. When 2. comparing models of different sizes, use Ra 2 3 However, for hypothesis testing the F statistic should be used. Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 16 / 25 Ftests for R 2
Assume model has intercept (design matrix has p columns). F SSR p 1 SSE n p 1 Fdistribution
If W
2 q is independent of Z W q Z r 2, r then Fq r Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 17 / 25 FTable Source Regression Residuals Sum of Squares SSR SSE n d.f. p 1 p 1 Mean Square SSR MSR p 1 MSE
SSE n p 1 F F
MSR MSE Reject at level if F F p 1 n
1 p
2 1 .
p This tests the hypothesis H0 0. Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 18 / 25 Nested models
Test the hypothesis that a subset of H0 That is, we have the model RM nested within FM Y
1 X1 p Xp 1 2 i 's are zero:
r 0 Y r 1 Xr 1 p Xp error error . Does X1 Xr have a significant marginal effect, after adjusting for the other predictors? Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 19 / 25 Nested models RM FM . Y Y r 1 Xr 1 p Xp error error 1 X1 p Xp df F df FM df RM df SSE RM SSE FM SSE FM n df FM F F
df n df FM Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 20 / 25 Testing Constraints
In some situations you want to test that your model parameters satisfy some constraint. Say you have model: Y and want to test: H0 This is equivalent to the model: Y
0 1 1 2 0 1 X1 2 X2 error X1 X2 df error 1. . Fit these two models, apply F test with Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 21 / 25 Testing Constraints Another example
Another example: Y and want to test: H0 This is equivalent to the model: Y
0 1 X1 1 2 0 1 X1 2 X2 error 1 1 1 X2 error . which can be simplified to: Y X2
0 1 X1 X2 error Fit these two models, apply F test with df 1. Usually df equals the number of constraints.
Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 22 / 25 Influence of Outliers
How much influence does the data point have on the model fit? Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 23 / 25 Different measures of Influence
1 How much influence does observation i have on its own fit? DFFITS yi
i yi i MSE i hii DFFITS exceeding 2 p n is considered large.
2 How much influence does observation i have on the fitted 's? DFBETAS
1 i 1 i MSE i c111 n is considered where c11 large. i xi x 2 . DFBETA exceeding 2 These conventional rules work for "reasonably sized" data sets.
Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 24 / 25 Different measures of Influence: Cook's Distance
1 Cook's distance is defined as: Di
n j 1 yj yj i 2 p 1 MSE 2 Considers the influence of Yi on all of the fitted values, not just the ith case. It can be shown that Di is equivalent to ri2 p 11 hii hii 3 4 Compare Di to the Fp 1n p 1 distribution. Nancy R. Zhang (Statistics 203) Lecture 5 January 19, 2010 25 / 25 ...
View
Full
Document
This note was uploaded on 02/27/2012 for the course STATS 315A taught by Professor Tibshirani,r during the Spring '10 term at Stanford.
 Spring '10
 TIBSHIRANI,R
 Statistics, Linear Regression

Click to edit the document details