This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Week 9 Tutorial Exercises Review Questions (these may or may not be discussed in tutorial classes) What is heteroskedasticity in a regression model? When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i), we say that heteroskedasticity is present. In the presence of heteroskedasticity, are the t‐stat and F‐stat from the usual OLS still valid? Why? Are there any other problems with the OLS under heteroskedasticity? The usual t‐stat and F‐stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity. Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators. What are the heteroskedasticity‐robust standard errors? How do you use them in STATA? These are the corrected standard errors that take into account the possible presence of heteroskedasticity. The t‐stat and F‐stat computed using the heteroskedasticity‐robust standard errors are valid test statistics. In STATA, these are easily obtained by using the option “robust” with the “regress” command, e.g., regress lwage educ, robust; How do you detect if there is heteroskedasticity? The Breusch‐Pagan test or the White test can be used to detect the presence of heteroskedasticity. See Section 8.3 for details. If heteroskedasticity is present in a known form, how would you estimate the model? In this case, the WLS estimators should be used, which is more efficient than the OLS estimators. See Section 8.4 for details. If heteroskedasticity is present in an unknown form, how would you estimate the model? If there is strong evidence for heteroskedasticity, the FGLS estimators should be used, which is based on an exponential functional form. See Section 8.4 for details. What are the steps in the FGLS estimation? You should summarise from Section 8.4. How would you handle the heteroskedasticity of the LPM? The heteroskedasticity functional form is known for the LPM. Hence the WLS can be used in principle. However, because the LPM can produce predicted probabilities that are outside the interval (0,1), the known functional form p(x)[1‐p(x)] may not be useful for the WLS. In that case, it may be necessary to use FGLS for the LPM. Problem Set (these will be discussed in tutorial classes) Q1. Wooldridge 8.1 Parts (ii) and (iii). The homoskedasticity assumption played no role in Chapter 5 in showing that OLS is consistent. But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid, even in large samples. As heteroskedasticity is a violation of the Gauss‐Markov assumptions, OLS is no longer BLUE. Q2. Wooldridge 8.2 With Var(u|inc,price,educ,female) = σ2inc2, h(x) = inc2, where h(x) is the heteroskedasticity function defined in equation (8.21). Therefore, √h(x) = inc, and so the transformed equation is obtained by dividing the original equation by inc: beer/inc = β0(1/inc) + β1 + β2price/inc + β3educ/inc + β4female/inc + u/inc Notice that β1, which is the slope on inc in the original model, is now a constant in the transformed equation. This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation. Q3. Wooldridge 8.3 False. The unbiasedness of WLS and OLS hinges crucially on Assumption MLR.4, and, as we know from Chapter 4, this assumption is often violated when an important variable is omitted. When MLR.4 does not hold, both WLS and OLS are biased. Without specific information on how the omitted variable is correlated with the included explanatory variables, it is not possible to determine which estimator has a small bias. It is possible that WLS would have more bias than OLS or less bias. Because we cannot know, we should not claim to use WLS in order to solve “biases” associated with OLS. Q4. Wooldridge 8.5 (i) No. For each coefficient, the usual standard errors and the heteroskedasticity‐robust ones are practically very similar. (ii) The effect is −.029(4) = −.116, so the probability of smoking falls by about .116. (iii) As usual, we compute the turning point in the quadratic: .020/[2(.00026)] ≈ 38.46, so about 38 and one‐half years. (iv) Holding other factors in the equation fixed, a person in a state with restaurant smoking restrictions has a .101 lower chance of smoking. This is similar to the effect of having four more years of education. (v) We just plug the values of the independent variables into the OLS regression line: = .656 − .069log(67.44)+.012log(6,500) −.029(16)+.020(77) −.00026(77) ≈ .0052. Thus, the estimated probability of smoking for this person is close to zero. (In fact, this person is not a smoker, so the equation predicts well for this particular observation.) Q5. Wooldridge C8.10 (401ksubs_c8_10.do) (i) In the following equation, estimated by OLS, the usual standard errors are in ( ) and the
heteroskedasticity‐robust standard errors are in [ ]:
401 = −.506 + .0124 inc − .000062 inc2 + .0265 age − .00031 age2 − .0035 male (.081) (.0006) (.000005) (.0039) (.00005) (.0121) [.079] [.0006] [.000005] [.0038] [.00004] [.0121] 2
n = 9,275, R = .094. There are no important differences; if anything, the robust standard errors are smaller. (ii) This is a general claim. Since Var(y|x) = p(x)[1‐p(x)], we can write E(u2|x) = p(x)‐p(x)2. Written in error form, u2 = p(x) ‐ p(x)2 + v. In other words, we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v, with the restrictions δ0 = 0, δ1 = 1, and δ2 = ‐1. Remember that, for the LPM, the fitted values, , are estimates of p(x). So, when we run the regression on and (including an intercept), the intercept estimates should be close to zero, the coefficient on should be close to one, and the coefficient on should be close to –1. (iii) The White LM statistic and F statistic about 581.9 and 310.32 respectively, both of which are very significant. The coefficient on 401 is about 1.010, the coefficient on 401 2 about −.970, and the intercept is about ‐.009. These estimates are quite close to what we expect to find from the theory in part (ii). (iv) The smallest fitted value is about .030 and the largest is about .697. The WLS estimates of the LPM are 401 = −.488 + .0126 inc − .000062 inc2 + .0255 age − .00030 age2 − .0055 male (.076) (.0005) (.000004) (.0037) (.00004) (.0117) 2
n = 9,275, R = .108. There are no important differences with the OLS estimates. The largest relative change is in the coefficient on male, but this variable is very insignificant using either estimation method. ...
View Full Document
- One '11