This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: STAT 512: Applied Regression Analysis Topic 4 Spring 2008 Topic Overview General Linear Tests Extra Sums of Squares Partial Correlations Multicollinearity Model Selection General Linear Tests These are a di erent way to look at the comparison of models. So far we have looked at comparing/selecting models based on: model signi cance test and R 2 values ttests for variables added last These are good things to look at, but they are ine ective in cases where explanatory variables work together in groups we want to test some hypotheses for some β i = b i rather than β i = 0 (for example, maybe we want to test H : β 1 = 3 ,β 4 = 7 against the alternative hypothesis that at least one of those is false) General Linear Tests look at the di erence between models in terms of SSE (unexplained SS ) 1 in terms of SSM (explained SS ) Because SSM + SSE = SST , these two comparisons are equivalent. The models we compare are hierarchical in the sense that one (the full model) includes all of the explanatory vari ables of the other (the reduced model). We can compare models with di erent explanatory variables. For example: X 1 ,X 2 vs X 1 X 1 ,X 2 ,X 3 ,X 4 ,X 5 vs X 1 ,X 2 ,X 3 . Note that the rst model includes all X 's of the second model. We will get an F test that compares the two models. We are testing a null hypothesis that the regression coe cients for the extra variables are all zero. For X 1 ,X 2 ,X 3 ,X 4 vs X 1 ,X 2 ,X 3 H : β 4 = β 5 = 0 H a : β 4 ,β 5 are not both 0 . Ftest The test statistic in general is F = ( SSE ( R ) SSE ( F )) / ( df E ( R ) df E ( F )) SSE ( F ) /df E ( F ) . Under the null hypothesis (reduced model) this statistic has an Fdistribution where the degrees of freedom are the number of extra variables and the df E for the larger model. So we reject if the pvalue for this test is ≤ . 05 and in that case conclude that at least one of the extra variables is useful for predicting Y in the linear model that already contains the variables in the reduced model. Example Suppose n = 100 and we are testing X 1 ,X 2 ,X 3 ,X 4 ,X 5 (full) vs X 1 ,X 2 ,X 3 (reduced). Our hypotheses are: H : β 4 = β 5 = 0 H a : β 4 ,β 5 are not both 0 . Since we are considering removing 2 variables ( X 4 and X 5 ), the numerator df is 2. The denominator df is n 6 = 94 (since p = 6 for the full model). We reject if the pvalue ≤ 0.05 and in that case would conclude that either X 4 or X 5 or both contain additional information that is useful for predicting Y in a linear model that also includes X 1 , X 2 and X 3 . 2 7.1: Extra Sums of Squares Notation for Extra SS Example using 5 variables: SSE ( X 1 ,X 2 ,X 3 ,X 4 ,X 5 ) is the SSE for the full model, SSE ( F ) SSE ( X 1 ,X 2 ,X 3 ) is the SSE for a reduced model, SSE ( R ) The extra sum of squares for this comparison is denoted SSM ( X 4 ,X 5  X 1 ,X 2 ,X 3 ) This is the di erence in the SSE 's: SSM ( X 4 ,X 5  X 1 ,X 2 ,X 3 ) = SSE ( R ) SSE ( F ) = SSE ( X 1 ,X 2 ,X 3 ) SSE ( X 1 ,X...
View
Full Document
 Fall '10
 Yen
 Regression Analysis, SSM

Click to edit the document details