Lecture 9

1 Introduction to Diagnostic Testing In a regression equation the N observations generate • K regression coefficients • N residuals with (N - K) distinct components (because Σ e i = 0 etc) These residuals should not have a systematic pattern Diagnostic statistics attempt to detect such patterns, which may indicate misspecification Misspecification leads to biased coefficient estimates with undesirable properties However diagnostic tests have low power if the regression equation has high variance The assumptions behind OLS The assumptions needed for OLS to be BLUE are 1E [u i ] = E[y i - Σβ j x ij ] = 0 Y is a linear function of regressors 2E [ (u i ) 2 ] = σ 2 disturbances have common variance σ 2 3E i u j ] = 0 if i j Assumption of independent disturbances – implied by random sampling Tests of this assumption are more relevant when using time series data: covered in Lecture 15 Figure 1 Nonlinearity and Structural Breaks 0 1000 2000 3000 1 6 11 16 Actual Values Ful Sample Regression Sample 1 Regression Sample 2 Regression

2 Figure 2 Nonlinearity and Residuals -300 -200 -100 0 100 200 1 6 11 16 Testing Assumption 1 - Linearity Suppose the true model is not linear (violating Assumption 1) Y = α + β f(X) where f(X) = γ 1 X + γ 2 X 2 The linear regression Y = α + β X is misspecified because it omits X 2 Examples Linear rather than quadratic trend Cobb-Douglas rather than CES production function Why test for linearity? Economic theory may not completely specify the functional form of a model ‘Engel’s Law’ predicts a diminishing proportion of income will be spent on food as income rises This is consistent with the linear regression FOOD = α + β INCOME α > 0, β < 1 and the log-linear regression log(FOOD) = φ + γ log(INCOME) γ <1 We need statistical techniques to chose between these models
