This preview shows pages 1–9. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: STA 414/2104 Mar 2, 2010 Notes I No Office Hour today, Tuesday Mar 2 I Practise test questions posted I HW 2 due date Tuesday Mar 9 I HW 2 questions and HW 1 answers Thursday, Mar 4 I Statistical Society of Canada case studies http://www.ssc.ca/documents/case_studies/2010/ 1 / 19 STA 414/2104 Mar 2, 2010 7.3 Estimating expected prediction error I Err T = E { L ( Y , f ( X ))  T } Err = E T [ E { L ( Y , f ( X ))  T } ] I know that err = 1 N L { y i , f ( x i ) } is too small I can show that on average, err should be inflated by 2 d 2 N I where d is the number of inputs 3 / 19 STA 414/2104 Mar 2, 2010 ... can show ( 7.4) I Define Err in = 1 N N X i = 1 E { L ( Y i , f ( x i ))  T } I new Y i at each of the training inputs x 1 ,..., x N I and define = E y ( Err in err ) I expectation over y instead of T ( x s fixed) I = 2 N N X i = 1 cov ( y i , y i ) I leading to (7.22) I E y ( Err in ) = E y ( err ) + 2 N N X i = 1 cov ( y i , y i ) = E y ( err ) + 2 N d 2 I for linear fits with d basis functions 4 / 19 STA 414/2104 Mar 2, 2010 ... can show I E y ( Err in ) = E y ( err ) + 2 N N X i = 1 cov ( y i , y i ) = E y ( err ) + 2 N d 2 I ( 7.5) err estimates E y ( err ) 2 N d 2 estimates 2nd term I with loglikelihood loss function, the result is I as N , 2 E { log Pr ( Y ; ) }  2 N N X i = 1 log Pr ( y i ; ) + 2 d N I which motivates AIC = 2 N ( ; y ) + 2 d N I e.g. in (7.30) d = d ( ) depends on a smoothing parameter I ( 7.6) with smoothing splines replace d by trace S I skip 7.7, 8, 9 5 / 19 STA 414/2104 Mar 2, 2010 Direct estimation of EPE 7.10 I another way to estimate prediction error I Err = E T E { L ( Y , f ( X ))  T } I CV ( f ) = 1 N N X i = 1 L ( y i , f ( i ) ( x i ) } I leaveoneout ( Nfold): I 1 N X L ( y i , f i ( x i )) = ( e . g . ) 1 N X { y i f i ( x i ) } 2 I y i f i ( x i ) = y i f ( x i ) 1 S ii I for any linear smoother f = Sy I CV = 1 N X { y i f ( x i ) 1 S ii } 2 I GCV = 1 N X { y i f ( x i ) 1 tr ( S ) / N } 2 6 / 19 STA 414/2104 Mar 2, 2010 comments I 7.10.2,3: CV must be carried out before any model simplification that uses the y s I both CV and Kfold CV seem to estimate expected prediction error but not prediction error I CV can be quite variable (especially leave one out) I p.231: using AIC, CV or GCV to choose smoothing parameter seems to overfit the data (?) I Example: biasvariance.R 7 / 19 STA 414/2104 Mar 2, 2010 Projection pursuit regression (...
View
Full
Document
 Spring '09

Click to edit the document details