This preview shows pages 1–7. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Smoothing  part 2 I Next page: fitted penalized regression splines for 3 smoothing parameters: 0, 100, and 5.7 I 5.7 is the optimal choice, to be discussed shortly I optimal curve is a sequence of straight lines I continuous, but 1st derivative is not continuous I Smoothed fits look smoother if continuous in 1st derivative and in 2nd derivative I Suggests joining together cubic pieces with appropriate constraints on the pieces so that the 1st and 2nd derivatives are continuous I Many very slightly different approaches I cubic regression splines (cubic smoothing splines) I thin plate splines c 2011 Dept. Statistics (Iowa State University) Stat 511 section 31 1 / 26 5 10 15 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 Age of diagnosis log Cpeptide concentration ~0 100 5.7 c 2011 Dept. Statistics (Iowa State University) Stat 511 section 31 2 / 26 I Well talk about thin plate splines because they provide an easy to implement way to fit multiple X s E y = f ( x 1 , x 2 ) as well as E y = f ( x 1 ) + f ( x 2 ) I The degree 3 thin plate spline with knots at ( k 1 , k 2 , .. . , k K ) f ( x ) = + 1 x + 2 x 2 + K X i = 1 u k  x k i  5 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 c 2011 Dept. Statistics (Iowa State University) Stat 511 section 31 3 / 26 I How much to smooth? I i.e. what 2 ? or what u k s I reminder: 0 no smoothing (linear or quadratic in tps) large close fit to data points I Well talk about three approaches: 1. Cross validation 2. Generalized cross validation 3. Mixed models c 2011 Dept. Statistics (Iowa State University) Stat 511 section 31 4 / 26 Cross validation I General method to estimate out of sample prediction error I Concept: Develop a model, want to assess how well it predicts I Might use rMSEP p ( y i y i ) 2 as a criterion. I Problem: data used twice, once to develop model and again to assess prediction accuracy I rMSEP systematically underestimates p ( y * i y * i ) 2 , where y * are new observations, not used in model development I Training/test set approach: split data in two parts I Training data: used to develop model, usually 50%, 80% or 90% of data set I Test set: used to assess prediction accuracy I Want a large training data set (to get a good model) and a large test set (to get a precise estimate of rMSEP) c 2011 Dept. Statistics (Iowa State University) Stat 511 section 31 5 / 26 I Cross validation gets the best of both. I leaveoneout cv: fit model without obs i , use that model to compute y i I 10fold cv: same idea, blocks of N / 10 observations I Can be used to choose a smoothing parameter I Find 2 that minimizes cv prediction error I CV ( 2 ) = n X i = 1 n y i f i ( x i ; 2 ) o 2 , where f i ( x i ; 2 ) is the predicted value of y i using a penalized linear spline function estimated with smoothing parameter...
View
Full
Document
This note was uploaded on 02/11/2012 for the course STAT 511 taught by Professor Staff during the Spring '08 term at Iowa State.
 Spring '08
 Staff

Click to edit the document details