This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CHAPTER 5 DIAGNOSTICS FOR MODEL SELECTION 1 5.1 Variable Selection P possible regressors, choose a subset of p ≤ P , p initially undetermined. Simplest solution: fit all 2 P possible models and compare their error sums of squares ( SSE s). By computing best model (smallest SSE ) for each p , the problem is reduced to choosing among P + 1 possible models, one for each p . This is known as “best subset regression”. Note, however, that even this idea may not be practical if P is too large. 2 Example using Nuclear Power Data p R 2 Variable names 1 0.46 PT 2 0.66 LS PT 3 0.76 D LS PT 4 0.81 D LS NE PT 5 0.83 D LS NE CT LN 6 0.86 D LS NE CT LN PT 7 0.86 D LS NE CT BW LN PT 8 0.87 D LT2 LS PR NE CT LN PT 9 0.87 D LT1 LT2 LS PR NE CT LN PT 10 0.87 D LT1 LT2 LS PR NE CT BW LN PT Best R 2 for each model order for nuclear power data. 3 The adjusted R 2 criterion The coefficient of multiple determination, usu ally denoted R 2 , is defined by R 2 = 1 SSE SSTO where SSE is the sum of squares of residuals and SSTO = ∑ ( y i ¯ y ) 2 is the total sum of squares. This measures the proportion of total sum of squares explained by the model. Adjusted R 2 defined by R 2 a = 1 ( n 1) SSE ( n p ) SSTO . (1) In practice, selects too large a p . 4 R 2 a R 2 p Variable names .826 .871 8 D LT2 LS PR NE CT LN PT .823 .857 6 D LS NE CT LN PT .821 .861 7 D LS NE CT BW LN PT .820 .861 7 D LS PR NE CT LN PT .820 .861 7 D LT2 LS NE CT LN PT .818 .871 9 D LT1 LT2 LS PR NE CT LN PT .818 .871 9 D LT2 LS PR NE CT BW LN PT .818 .865 8 D LS PR NE CT BW LN PT .815 .857 7 D LT1 LS NE CT LN PT .815 .863 8 D LT2 LS NE CT BW LN PT Nuclear power data: Best 10 models according to adjusted R 2 criterion. 5 Prediction error criteria Suppose y i is the i ’th data point and that x i is the corresponding vector of covariates. Then y i = x T i β + ² i . Suppose we are trying to predict the value of a future observation at the same covariate vector x i . This may be written in the form y * i = x T i β + ² * i , where ² * i is independent of ² i , with the same mean 0 and variance σ 2 . The obvious point predictor of y * i is ˆ y * i = x T i ˆ β, where ˆ β is the least squares estimator of β . If the assumed model is correct, then ˆ β is an un biased estimator of β , but in the present con text, we do not know the true model, so we do not assume that ˆ β is unbiased. 6 The mean squared prediction error (MSPE) is of the form E( ˆ y * i y * i ) 2 = E[ { x T i ( ˆ β β ) } 2 ] + σ 2 , (2) using the fact that ² * i is independent of all past observations and therefore of ˆ β . However, the + σ 2 term in (2) is present regardless of the model we adopt, so we concentrate on the first term on the right hand side, writing it as a decomposition into squared bias plus variance, and summing over all i to get a formula for the overall mean squared prediction error....
View
Full
Document
 Fall '11
 Staff

Click to edit the document details