Lecture 7: Correlation and Regression

Excel output gives you the standard error for the

Excel output gives you the "standard error" for the regression as a whole (in summary output; you will see that each coefficient also has a "standard error"). This standard error is the "error" of the predicted y values or the standard deviation of data around the regression line. 6

The second output box is called "ANOVA" for analysis of variance. (cf. also chapter 10). In the third column, you can see the calculations for the SST, SSE, and SSR. The important point for analysis on this ANOVA output table is the F statistic and its associated p-value. This tests the significance of the regression as a whole, and is based on this underlying hypothesis test: H 0 : none of the coefficients is significantly different from zero. H a : at least one of the coefficients is significantly different from zero. This is a joint test of the hypothesis that each coefficient is = 0. You want to reject the null hypothesis (p value of <0.05) to say that the regression is significant. If your p value is greater than 0.05, then you essentially have an insignificant regression and you don't need to/don't want to interpret your results. Your next box gives you the estimated coefficients (β's). For each coefficient you have the estimated coefficient and a standard error associated with each coefficient estimate. (Remember, each of these coefficients is an estimate, every estimate has a standard error which is a function of sample size). Now, we want to do a hypothesis test on each individual coefficient to see whether each coefficient is "statistically significant" or "statistically significantly different from zero." Our hypothesis is: H o : β = 0 H a : β ≠ 0 Under the null hypothesis, the coefficient divided by its standard error is distributed as a t distribution with (n - k -1) degrees of freedom. (recall that n=number of observations, and k=number of independent variables). This can be written as: ) 1 ( 0 ~ . . : - - k n t of e s H β β As a commonly accepted practice, regression coefficients with a p-value (for this t-test) of less than 0.05 are considered significant. You are also given the 95 percent confidence intervals for each coefficient. You will notice that if a coefficient is insignificant , then its 95 percent confidence interval contains zero. Interpreting coefficients: The coefficient tells you how much you would expect the dependent variable to change with a one-unit change in the independent variable, KEEPING ALL OTHER VARIABLES CONSTANT, expressed in the units of the dependent variable. 7
You should only interpret those variables for which the coefficient is statistically significantly different from zero. p 342 in the book suggests that you can eliminate coefficients which are not statistically significant. BE VERY CAREFUL DOING THIS -- IT IS NOT GENERALLY RECOMMENDED. This is called "REGRESSION FISHING" and is very, very bad research design.

