chap5 - CHAPTER 5 DIAGNOSTICS FOR MODEL SELECTION 1 5.1...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CHAPTER 5 DIAGNOSTICS FOR MODEL SELECTION 1 5.1 Variable Selection P possible regressors, choose a subset of p ≤ P , p initially undetermined. Simplest solution: fit all 2 P possible models and compare their error sums of squares ( SSE s). By computing best model (smallest SSE ) for each p , the problem is reduced to choosing among P + 1 possible models, one for each p . This is known as “best subset regression”. Note, however, that even this idea may not be practical if P is too large. 2 Example using Nuclear Power Data p R 2 Variable names 1 0.46 PT 2 0.66 LS PT 3 0.76 D LS PT 4 0.81 D LS NE PT 5 0.83 D LS NE CT LN 6 0.86 D LS NE CT LN PT 7 0.86 D LS NE CT BW LN PT 8 0.87 D LT2 LS PR NE CT LN PT 9 0.87 D LT1 LT2 LS PR NE CT LN PT 10 0.87 D LT1 LT2 LS PR NE CT BW LN PT Best R 2 for each model order for nuclear power data. 3 The adjusted R 2 criterion The coefficient of multiple determination, usu- ally denoted R 2 , is defined by R 2 = 1- SSE SSTO where SSE is the sum of squares of residuals and SSTO = ∑ ( y i- ¯ y ) 2 is the total sum of squares. This measures the proportion of total sum of squares explained by the model. Adjusted R 2 defined by R 2 a = 1- ( n- 1) SSE ( n- p ) SSTO . (1) In practice, selects too large a p . 4 R 2 a R 2 p Variable names .826 .871 8 D LT2 LS PR NE CT LN PT .823 .857 6 D LS NE CT LN PT .821 .861 7 D LS NE CT BW LN PT .820 .861 7 D LS PR NE CT LN PT .820 .861 7 D LT2 LS NE CT LN PT .818 .871 9 D LT1 LT2 LS PR NE CT LN PT .818 .871 9 D LT2 LS PR NE CT BW LN PT .818 .865 8 D LS PR NE CT BW LN PT .815 .857 7 D LT1 LS NE CT LN PT .815 .863 8 D LT2 LS NE CT BW LN PT Nuclear power data: Best 10 models according to adjusted R 2 criterion. 5 Prediction error criteria Suppose y i is the i ’th data point and that x i is the corresponding vector of covariates. Then y i = x T i β + ² i . Suppose we are trying to predict the value of a future observation at the same covariate vector x i . This may be written in the form y * i = x T i β + ² * i , where ² * i is independent of ² i , with the same mean 0 and variance σ 2 . The obvious point predictor of y * i is ˆ y * i = x T i ˆ β, where ˆ β is the least squares estimator of β . If the assumed model is correct, then ˆ β is an un- biased estimator of β , but in the present con- text, we do not know the true model, so we do not assume that ˆ β is unbiased. 6 The mean squared prediction error (MSPE) is of the form E( ˆ y * i- y * i ) 2 = E[ { x T i ( ˆ β- β ) } 2 ] + σ 2 , (2) using the fact that ² * i is independent of all past observations and therefore of ˆ β . However, the + σ 2 term in (2) is present regardless of the model we adopt, so we concentrate on the first term on the right hand side, writing it as a decomposition into squared bias plus variance, and summing over all i to get a formula for the overall mean squared prediction error....
View Full Document

Page1 / 129

chap5 - CHAPTER 5 DIAGNOSTICS FOR MODEL SELECTION 1 5.1...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online