Unformatted text preview: Week 7 Tutorial Exercises Review Questions (these may or may not be discussed in tutorial classes) What are the advantages of using the log of a variable in regression? Find the “rules of thumb” for taking logs. See page 191 of Wooldridge (p198‐199 for the 3rd edition) Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm. Do you remember Table 2.3? Consult Table 2.3. How do you compute the change in y caused by Δx when the model is built for log(y)? A precise computation is given by Equation (6.8) of Wooldridge. Why do we need the “interaction” terms in regression models? An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable. See (6.17) for example. What is the adjusted R‐squared? What is the difference between it and the R‐squared? The primary attractiveness of the adjusted R‐squared is that it imposes penalty for adding additional explanatory variables to a model. The R‐squared can never fall when a new explanatory variable is added. However, the adjusted R‐squared will fall if the t‐ratio on the new variable is less than one in absolute value. How do you construct interval prediction for given x‐values? This is nicely summarised in Equations (6.27)‐(6.31) and the surrounding text. How do you predict y for given x‐values when the model is built for log(y)? Check the list on page 212 (p220 for the 3rd edition). What is involved in “residual analysis”? See pages 209‐210 (p217‐218 for the 3rd edition). Problem Set (these will be discussed in tutorial classes) Q1. Wooldridge 6.4 (i) Holding all other factors fixed we have Δlog(wage) = β1 Δeduc + β2 Δeduc pareduc = (β1 + β2 pareduc) Δeduc . Dividing both sides by Δeduc gives the result. The sign of β2 is not obvious, although β2 > 0 if we think a child gets more out of another year of education the more highly educated are the child’s parents. (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educ pareduc. The difference in the estimated return to education is .00078(32 – 24) = .0062, or about .62 percentage points. (Percentage points are changes in percentages.) (iii) When we add pareduc by itself, the coefficient on the interaction term is negative. The t statistic on educ pareduc is about –1.33, which is not significant at the 10% level against a two‐
sided alternative. Note that the coefficient on pareduc is significant at the 5% level against a two‐sided alternative. This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect. Q2. Wooldridge 6.5 This would make little sense. Performances on math and science exams are measures of outputs of the educational process, and we would like to know how various educational inputs and school characteristics affect math and science scores. For example, if the staff‐to‐
pupil ratio has an effect on both exam scores, why would we want to hold performance on the science test fixed while studying the effects of staff on the math pass rate? This would be an example of controlling for too many factors in a regression equation. The variable scill could be a dependent variable in an identical regression equation. Q3. Wooldridge C6.3 (C6.2 for the 3rd Edition) (wage2_c6_3.do) (i) Holding exper (and the elements in u) fixed, we have Δ log(wage) = β1Δeduc + β2Δeduc exper = (β1 + β2exper)Δeduc , or Δ log(wage)/ Δeduc = (β1 + β2exper), This is the approximate proportionate change in wage given one more year of education. (ii) H0: β3 = 0. If we think that education and experience interact positively – so that people with more experience are more productive when given another year of education, then β3 > 0 is the appropriate alternative. (iii) The estimated equation is log = 5.95 + .0440 educ – .0215 exper + .00320 educ exper (0.24) (.0174) (.0200) (.00153) 2
2 n = 935, R = .135, adjusted‐R = .132. The t statistic on the interaction term is about 2.13, which gives a p‐value below .02 against H1: β3 > 0. Therefore, we reject H0: β3 = 0 in favor of H1: β3 > 0 at the 2% level. (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper − 10) + u, and run the regression log(wage) on educ, exper, and educ(exper – 10). We want the coefficient on educ. We obtain θ1 ≈ .0761 and se(θ1) ≈ .0066. The 95% CI for θ1 is about .063 to .089. Q4. Wooldridge C6.8 (hprice1_c6_8.do) (i) The estimated equation (where price is in dollars) is = −21,770.3 + 2.068 lotsize + 122.78 sqrft + 13,852.5 bdrms (29,475.0) (0.642) (13.24) (9,010.1) 2
2 n = 88, R = .672, adjusted‐R = .661, = 59,833. The predicted price at lotsize = 10,000, sqrft = 2,300, and bdrms = 4 is about $336,714. (ii) The regression is pricei on (lotsizei – 10,000), (sqrfti – 2,300), and (bdrmsi – 4). We want the intercept estimate and the associated 95% CI from this regression. The CI is approximately 336,706.7 ± 14,665, or about $322,042 to $351,372 when rounded to the nearest dollar. (iii) We must use equation (6.36) to obtain the standard error of and then use equation (6.37) (assuming that price is normally distributed). But from the regression in part (ii), se( ) ≈ 7,374.5 and ≈ 59,833. Therefore, se( ̂ ) ≈ [(7,374.5)2 + (59,833) 2]1/2 ≈ 60,285.8. Using 1.99 as the approximate 97.5th percentile in the t84 distribution gives the 95% CI for price0, at the given values of the explanatory variables, as 336,706.7 ± 1.99(60,285.8) or, rounded to the nearest dollar, $216,738 to $456,675. This is a fairly wide prediction interval. But we have not used many factors to explain housing price. If we had more, we could, presumably, reduce the error standard deviation, and therefore , to obtain a tighter prediction interval. ...
View Full Document
- One '11
- explanatory variables, additional explanatory variables, interaction term, identical regression equation