This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Week 10 Tutorial Exercises Readings Read Chapter 9 thoroughly. Make sure that you know the meanings of the Key Terms at the chapter end. Review Questions (these may or may not be discussed in tutorial classes) What is functional form misspecification? What is its major consequence in regression analysis? As the word “misspecification” suggests, it is a mis‐specified model with a wrong functional form for explanatory variables. For example, a log wage model that does not include age2 is mis‐
specified if the partial effect of age on log wage is upside‐down‐U shaped. In linear regression models, the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables. Hence the consequences of omitting important variables apply and the major one is estimation bias. How would you test for functional form misspecification? The RESET may be used to test for functional form misspecification, more specifically, for neglected nonlinearities (nonlinear function of regressors). The RESET is based on the F‐test for the joint significance of the squared and cubed predicted value (y‐hat) in a “expanded” model that includes all the original regressors as well as the squared and cubed y‐hat, where the latter represent possible nonlinear functions of the regressors. What are nested models? And, nonnested models? Two models are nested if one is a restricted version of the other (by restricting the values of parameters). Two models are nonnested if neither nests the other. What is the purpose of testing one model against another? The purpose of testing one model against another is to select a “best” or “preferred” model for statistical inference and prediction. How would you test for two nonnested models? There are two strategies. The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be “excluded”. The second is to test the significance of the predicted value of one model in the other model (known as Davidson‐
MacKinnon test). What is a proxy variable? What are the conditions for a proxy variable to be valid in regression analysis? A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable. There are two conditions for the validity of a proxy. The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy). The second is that the conditional mean of the unobserved, given other explanatory variables and the proxy, only depends on the proxy. Can you analyse the consequences of measurement errors? With a careful reading of Section 9.4, you should be able to analyse. In what circumstances missing observations will cause major concerns in regression analysis? Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample. Nonrandom samples may cause bias in the OLS estimation. What is exogenous sample selection? What is endogenous sample selection? The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables. It is easily seen that such sampling scheme does not violate MLR1,3,4 and hence does not introduce bias in the OLS estimation. On the other hand, the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation. What are outliers and influential observations? Outliers are unusual observations that are far away from the centre of other observations. A small number of outliers can have large impact on the OLS estimates of parameters. These are called influential observations if the estimation results when including them are significantly different from the results when excluding them. Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1, 2,…, n, where the “slop parameter” contains a random variable bi; α and β are constant parameters. Assume that the usual MLR1‐5 hold for yi = α + βxi + ui, ; E(bi|xi) = 0 and Var(bi|xi) = ω2. If we regress y on x with a intercept, will the estimator of the “slope parameter” be biased from β? Will the usual OLS standard errors be valid for statistical inference? As the model can be writted as yi = α + β xi + (bixi + ui), where the overall disturbance is (bixi + ui). Given the stated assumptions, MLR1‐4 hold for this model and the OLS estimator of the “slope parameter” is unbiased (from β). However, MLR5 does not hold since the conditional variance of the overall disturbance, var(bixi + ui), will be dependent on xi , and the usual OLS standard errors will not be valid. Problem Set (these will be discussed in tutorial classes) Q1. Wooldridge 9.1 There is functional form misspecification if β6 ≠ 0 or β7 ≠ 0, where these are the population parameters on ceoten2 and comten2, respectively. Therefore, we test the joint significance of these variables using the R‐squared form of the F test: F = [(.375 − .353)/2]/[(1 − .375)/(177 – 8)] = 2.97. With 2 and ∞ df, the 10% critical value is 2.30, while the 5% critical value is 3.00. Thus, the p‐
value is slightly above .05, which is reasonable evidence of functional form misspecification. (Of course, whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter.) Q2. Wooldridge 9.3 (i) Eligibility for the federally funded school lunch program is very tightly linked to being economically disadvantaged. Therefore, the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty. (ii) We can use our usual reasoning on omitting important variables from a regression equation. The variables log(expend) and lnchprg are negatively correlated: school districts with poorer children spend, on average, less on schools. Further, β3 < 0. From Table 3.2, omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model]. So when we control for the poverty rate, the effect of spending falls. (iii) Once we control for lnchprg, the coefficient on log(enroll) becomes negative and has a t of about –2.17, which is significant at the 5% level against a two‐sided alternative. The coefficient implies that Δmath10 ≈ −(1.26/100)(%Δenroll) = −.0126(%Δenroll). Therefore, a 10% increase in enrollment leads to a drop in math10 of .126 percentage points. (iv) Both math10 and lnchprg are percentages. Therefore, a ten percentage point increase in lnchprg leads to about a 3.23 percentage point fall in math10, a sizeable effect. (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test: less than 3%. In column (2), we are explaining almost 19% (which still leaves much variation unexplained). Clearly most of the variation in math10 is explained by variation in lnchprg. This is a common finding in studies of school performance: family income (or related factors, such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics. Q3. Wooldridge 9.5 The sample selection in this case is arguably endogenous. Because prospective students may look at campus crime as one factor in deciding where to attend college, colleges with high crime rates have an incentive not to report crime statistics. If this is the case, then the chance of appearing in the sample is negatively related to u in the crime equation. (For a given school size, higher u means more crime, and therefore a smaller probability that the school reports its crime figures.) Q4. Wooldridge C9.3 (jtrain_c9_3.do) (i) If the grants were awarded to firms based on firm or worker characteristics, grant could easily be correlated with such factors that affect productivity. In the simple regression model, these are contained in u. (ii) The simple regression estimates using the 1988 data are log(scrap) = .409 + .057 grant (.241) (.406) n = 54, R2 = .0004. The coefficient on grant is actually positive, but not statistically different from zero. (iii) When we add log(scrap87) to the equation, we obtain log(scrap88) = .021 − .254 grant88 + .831 log(scrap87) (.089) (.147) (.044) 2 n = 54, R = .873, where the years are indicated as suffixes to the variables for clarity. The t statistic for H0: βgrant = 0 is −.254/.147 ≈ −1.73. We use the 5% critical value for 40 df in Table G.2: ‐1.68. Because t = −1.73 < −1.68, we reject H0 in favor of H1: βgrant < 0 at the 5% level. (iv) The t statistic is (.831 – 1)/.044 ≈ −3.84, which is a strong rejection of H0. (v) With the heteroskedasticity‐robust standard error, the t statistic for grant88 is −.254/.142 ≈ −1.79, so the coefficient is even more significantly less than zero when we use the heteroskedasticity‐robust standard error. The t statistic for H0: βlog(scrap87) = 1 is (.831 – 1)/.071 ≈ −2.38, which is notably smaller than before, but it is still pretty significant. ...
View Full Document
This note was uploaded on 02/29/2012 for the course ECON 2206 taught by Professor Yang during the One '11 term at University of New South Wales.
- One '11