Unformatted text preview: Week 8 Tutorial Exercises Review Questions (these may or may not be discussed in tutorial classes) What is a qualitative factor? Give some examples. This should be easy. How do you use dummy variables to represent qualitative factors? A factor with two levels (or values) is easily represented one dummy variable. For a factor with more than two levels (or values), we need more than one dummies. For example, a factors with 5 values requires 4 dummy variables. How do you use dummy variables to represent an ordinal variable? An ordinal variable may be represented by dummies in the same way as a multi‐levelled factor. How do you test for differences in regression functions across different groups? We generally use the F test on properly‐defined restricted model and unrestricted model. Chow test is a special case of the F test, where the null hypothesis is no difference across groups. What can you achieve by interacting group dummy variables with other regressors? The interaction allows the slope coefficients to be different across groups. What is “program evaluation”? It assesses the effectiveness of a program, by checking the differences in regression coefficients for the controlled (or base) group and the treatment group. What are the interpretations of the PRF and SRF when the dependent variable is binary? The PRF is the conditional probability of “success” for given explanatory variables. The SRF is the predicted conditional probability of “success” for given explanatory variables. What are the shortcomings of the LPM? The major drawback of the LPM is that the predicted probability of success may be outside the interval [0,1]. Does MLR5 hold for the LPM? No, as indicated in the textbook and lecture, the conditional variance of the disturbance is not a constant and depends on explanatory variables: var(u|x) = E(y|x)[1− E(y|x)]. Problem Set (these will be discussed in tutorial classes) Q1. Wooldridge 7.4 (i) The approximate difference is just the coefficient on utility times 100, or –28.3%. The t statistic is −.283/.099 ≈ −2.86, which is statistically significant. (ii) 100 [exp(−.283) – 1) ≈ −24.7%, and so the estimate is somewhat smaller in magnitude. (iii) The proportionate difference is .181 − .158 = .023, or about 2.3%. One equation that can be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u, where trans is a dummy variable for the transportation industry. Now, the base group is finance, and so the coefficient δ1 directly measures the difference between the consumer products and finance industries, and we can use the t statistic on consprod. Q2. Wooldridge 7.6 In Section 3.3 – in particular, in the discussion surrounding Table 3.2 – we discussed how to determine the direction of bias in the OLS estimators when an important variable (ability, in this case) has been omitted from the regression. As we discussed there, Table 3.2 only strictly holds with a single explanatory variable included in the regression, but we often ignore the presence of other independent variables and use this table as a rough guide. (Or, we can use the results of Problem 3.10 for a more precise analysis.) If less able workers are more likely to receive training, then train and u are negatively correlated. If we ignore the presence of educ and exper, or at least assume that train and u are negatively correlated after netting out educ and exper, then we can use Table 3.2: the OLS estimator of β1 (with ability in the error term) has a downward bias. Because we think β1 ≥ 0, we are less likely to conclude that the training program was effective. Intuitively, this makes sense: if those chosen for training had not received training, they would have lowers wages, on average, than the control group. Q3. Wooldridge 7.9 (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z. (ii) Setting f0(z*) = f1(z*) gives β0 + β1z* = (β0+δ0) + (β1+δ1)z* . Therefore, provided δ1 ≠ 0, we have z* = −δ0/δ1 . Clearly, z* is positive if and only if δ0/δ1 is negative, which means δ0 and δ1 must have opposite signs. (iii) Using part (ii) we have totcoll* = .357/.030 = 11.9 years. (totcoll= total college years). (iv) The estimated years of college where women catch up to men is much too high to be practically relevant. While the estimated coefficient on female totcoll shows that the gap is reduced at higher levels of college, it is never closed – not even close. In fact, at four years of college, the difference in predicted log wage is still −.357+.030(4) = .237, or about 21.1% less for women. Q4. Wooldridge C7.13 (apple_c7_13.do) (i) 412/660 ≈ .624. (ii) The OLS estimates of the LPM are = .424 − .803 ecoprc + .719 regprc + .00055 faminc + .024 hhsize (.165) (.109) (.132) (.00053) (.013) + .025 educ − .00050 age (.008) (.00125) 2 n = 660, R = .110 If ecoprc increases by, say, 10 cents (.10), then the probability of buying eco‐labeled apples falls by about .080. If regprc increases by 10 cents, the probability of buying eco‐labeled apples increases by about .072. (Of course, we are assuming that the probabilities are not close to the boundaries of zero and one, respectively.) (iii) The F test, with 4 and 653 df, is 4.43, with p‐value = .0015. Thus, based on the usual F test, the four non‐price variables are jointly very significant. Of the four variables, educ appears to have the most important effect. For example, a difference of four years of education implies an increase of .025(4) = .10 in the estimated probability of buying eco‐labeled apples. This suggests that more highly educated people are more open to buying produce that is environmentally friendly, which is perhaps expected. Household size (hhsize) also has an effect. Comparing a couple with two children to one that has no children – other factors equal – the couple with two children has a .048 higher probability of buying eco‐labeled apples. (iv) The model with log(faminc) fits the data slightly better: the R‐squared increases to about .112. (We would not expect a large increase in R‐squared from a simple change in the functional form.) The coefficient on log(faminc) is about .045 (t = 1.55). If log(faminc) increases by .10, which means roughly a 10% increase in faminc, then P(ecobuy = 1) is estimated to increase by about .0045, a pretty small effect. (v) The fitted probabilities range from about .185 to 1.051, so none are negative. There are two fitted probabilities above 1, which is not a source of concern with 660 observations. ≥ 0.5 and zero otherwise – (vi) Using the standard prediction rule predict one when gives the fraction correctly predicted for ecobuy = 0 as 102/248 ≈ .411, so about 41.1%. For ecobuy = 1, the fraction correctly predicted is 340/412 ≈ .825, or 82.5%. With the usual prediction rule, the model does a much better job predicting the decision to buy eco‐labeled apples. (The overall percent correctly predicted is about 67%.) ...
View Full Document
- One '11
- dummy variables, group dummy variables