EXST7015 Fall2011 Lect09

EXST7015 Fall2011 Lect09 - Statistical Techniques II Page...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Techniques II Page 32 Multiple regression Multiple regression involves two or more independent variables (Xi), but still only a single dependent variable (Yi). There is an analysis for multiple dependent variables, it is called multivariate regression. The sample equation in multiple regression is of the form Yi b0 b1 X 1i b2 X 2i b3 X 3i ei The objectives in multiple regression are generally the same as SLR. Testing hypotheses about potential relationships (using correlations), fitting and documenting relationships, and estimating parameters with confidence intervals. The good news, most of what we know about simple linear regressions applies to multiple regression. The regressions equation is similar. The assumptions for the regression are the same as for Simple Linear Regression The interpretation of the parameter estimates are the same (units are Y units per X units, and measure the change in Y for a 1 unit change in X). The diagnostics used in simple linear regression are mostly the same for multiple regression. Residuals can still be examined for outliers, homogeneity, normality, curvature, influence, etc., as with SLR. The only difference is that, since we have several X's, we would usually plot the residuals on Yhat instead of a single X variable. So what is different? Obviously, the calculations are more complicated. Algebraic equations basically do not exist. Matrix algebra must be used. Also, we now have several independent variables, X1, X2, X3, etc. We will need some mechanism to evaluate these individually. To this end, we will discuss a new type of Sum of Squares not needed for simple linear regression. The use of several independent variables also creates some new problems. If the independent variable are highly correlated we have a problem called multicollinearity. We will need some diagnostics to evaluate this problem. Outside of the new diagnostics needed to deal with several independent variables, SLR and Multiple regression are very similar. To do multiple reg. in SAS we specify a model with the variables of interest. For example, a regression on Y with 3 variables X1, X2 and X3 would be specified as PROC REG; MODEL Y = X1 X2 X3; To get the SS Type 1 and SS Type 2 we add the options “/ ss1 ss2;” James P. Geaghan - Copyright 2011 Statistical Techniques II Page 33 Extra Sum of Squares When a variable is added to a model, it usually accounts for some variation. In rare circumstances the variable will account for zero sum of squares. This is rare in practice. Most often, when a variable is added to a model it reduces the error sum of squares (SSE) and increases the model sum of squares (SSReg). The sum of squares that a variable causes to be removed from the error SS and added to the model SS is called its “ExtraSS”. If each variable had its unique ExtraSS, the concept would be simple. However, two variables in a model are rarely wholly independent. Two variables in a model may “compete” for SS, so that if one enters the model first it gets SS that a second variable might have taken had it entered first. Or, one variable may actually enhance another, so that the second variable may actually account for more SS after the first variable is entered than if it had entered first. As a result we cannot talk simply about the ExtraSS for a variable. We have to consider the ExtraSS in the context of what variables were already in the model. Extra SS notation. The ExtraSS will simply be denoted SSXi for each variable Xi. Since we must consider which variables are already in the model we will add to this notation a designation of what is already in the model. For example, the extra SS for X2, given that X1 and X3 are already in the model will be SSX2 | X1, X3. We could also add a designation for the intercept (usually X0). So the extra SS mentioned earlier could be SSX2 | X0, X1, X3. However, since ALL models we will discuss are adjusted for the intercept first, we will usually leave off the X0. Using this notation, a simple linear regression Extra SS for X would be just SSX. Adjustment for X0 is assumed. Extra SS Example 1 The first example is with a two factor multiple regression. The data is given below. We will fit the simple linear regressions for each variable, and then the two factor multiple regression to calculate the extra ss. In SAS we simply list the variables in the PROC REG model statement, MODEL Y = X1 X2; Obs 1 2 3 4 5 6 7 8 9 10 Y 18 22 34 36 42 54 68 77 87 92 X1 3 1 2 6 8 6 7 4 9 6 X2 4 8 11 5 1 5 10 10 11 8 James P. Geaghan - Copyright 2011 Statistical Techniques II Page 34 Simple linear regression for Y on X1. Analysis of Variance Table Source Model Error C Total DF 1 8 9 Sum of Squares 2381.31494 4054.68506 6436.00000 Mean Square 2381.31494 506.83563 F Value 4.698 Prob>F 0.0621 Note the slope for the variable X1 is not significantly different from zero. Analysis of Variance Table (Y on X2) Source Model Error C Total DF 1 8 9 Sum of Squares 1446.14793 4989.85207 6436.00000 Mean Square 1446.14793 623.73151 F Value 2.319 Prob>F 0.1663 The slope for X2 also is not significantly different from zero either. Analysis of Variance (Y on both X1 and X2) Source Model Error C Total DF 2 7 9 Sum of Squares 4523.43473 1912.56527 6436.00000 Mean Square 2261.71736 273.22361 F Value 8.278 Prob>F 0.0143 The two variables together are significant, but not individually. Now \compare the results of the three fits, first calculating the Extra SS. Since the SSReg increases by the same amount that the SSError decreases for each variable, either SS can be used. I will use the Error SS. Error SS for X1 alone Error SS for X2 alone Error SS for X1 and X2 SSTotal (all models) 4054.68506 4989.85207 1912.56527 6436.00000 The first variable alone (X1) had an error SS equal to 4054.68506 out of the corrected total of 6436. The difference is 2381.31494, and this is the ExtraSS for X1, the amount of the total accounted for by the variable. SSX1 = 6436 – 4054.68506 = 2381.31494 Likewise, the ExtraSS for X2 alone is SSX2 = 6436 – 4989.85207 = 1446.14793 Now, how did they do together in the multiple regression? The two variables together had an error term of 1912.56527. Since the first variable alone had an error of 4054.68506, the second must have reduced the model by an additional amount equal to the difference. This is an additional amount of SSX2|X1 = 4054.68506 – 1912.56527 = 2142.119793. And likewise, the ExtraSS for X1 is SSX1|X2 = 4989.85207 – 1912.56527 = 3077.286793. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 35 In summary, SSX1 = 2381.31494 SSX2 = 1446.14793 SSX1|X2 =3077.286793 SSX2|X1 = 2142.119793 In this case the variables actually enhanced each other, performing better together than alone. Although not the rule, this is not unusual. 3 factor model The calculation of extra SS is exactly the same for larger models. The following example is a 3 factor multiple regression. In SAS this model would be PROC REG; MODEL Y = X1 X2 X3; The raw data is given below. Obs 1 2 3 4 5 6 7 8 9 10 11 12 Y 1 3 5 3 6 4 2 8 9 3 5 6 X1 2 4 7 3 5 3 2 6 7 8 7 9 X2 9 6 7 5 8 4 3 2 5 2 3 1 X3 2 5 9 5 9 2 6 1 3 4 7 4 The results of the regressions are; SSTotal is 62.91667 for all models For the 1 factor models the results are; Regression of Y on X1: SSError = 38.939, SSModel = 23.978 Regression of Y on X2: SSError = 58.801, SSModel = 4.115 Regression of Y on X3: SSError = 62.680, SSModel = 0.237 The extra SS are equal to the model SS for 1 factor models. SSX1 = 23.978 SSX2 = 4.115 SSX3 = 0.237 (or SSX1|X0) (or SSX2|X0) (or SSX3|X0) These SS are adjusted for the intercept (correction factor). This will always be the case for our examples, so the X0 is often omitted. Fitting X1 and X2 and X3 together TWO AT A TIME we get the following results. James P. Geaghan - Copyright 2011 ...
View Full Document

This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.

Ask a homework question - tutors are online