{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

31Spline2

# 31Spline2 - Smoothing-part2 Nextpage: parameters ∼...

This preview shows pages 1–3. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Smoothing-part2 Nextpage:fittedpenalizedregressionsplinesfor3smoothing parameters: ∼ 0,100,and5.7 5.7isthe“optimal”choice,tobediscussedshortly “optimal”curveisasequenceofstraightlines continuous,but1stderivativeisnotcontinuous Smoothedfitslook“smoother”ifcontinuousin1stderivativeand in2ndderivative Suggestsjoiningtogethercubicpieceswithappropriate constraintsonthepiecessothatthe1stand2ndderivativesare continuous Manyveryslightlydifferentapproaches cubicregressionsplines(cubicsmoothingsplines) thinplatesplines c 2011Dept.Statistics(IowaStateUniversity) Stat511section31 1/26 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 1 1 5 3 .0 3 .5 4 .0 4 .5 5 .0 5 .5 6 .0 6 .5 Age of diagnosis lo g C- p e p t id e c o n c e n t r a t io n ~0 100 5.7 c 2011Dept.Statistics(IowaStateUniversity) Stat511section31 2/26 We’lltalkaboutthinplatesplinesbecausetheyprovideaneasyto implementwaytofitmultiple X ’s E y = f ( x 1 , x 2 ) aswellasE y = f ( x 1 )+ f ( x 2 ) Thedegree3thinplatesplinewithknotsat ( k 1 , k 2 ,..., k K ) f ( x )= β + β 1 x + β 2 x 2 + K i = 1 u k | x − k i | 5 0.0 0.2 0.4 0.6 0.8 1.0 . . 2 .4 .6 .8 1 .0 c 2011Dept.Statistics(IowaStateUniversity) Stat511section31 3/26 Howmuchtosmooth? i.e.what λ 2 ?orwhat u k ’s reminder:0 ⇒ nosmoothing(linearorquadraticintps) large ⇒ closefittodatapoints We’lltalkaboutthreeapproaches: 1. Crossvalidation 2. Generalizedcrossvalidation 3. Mixedmodels c 2011Dept.Statistics(IowaStateUniversity) Stat511section31 4/26 Crossvalidation Generalmethodtoestimate“outofsample”predictionerror Concept:Developamodel,wanttoassesshowwellitpredicts MightuserMSEP ∑ ( y i − ˆ y i ) 2 asacriterion. Problem:datausedtwice,oncetodevelopmodelandagainto assesspredictionaccuracy rMSEPsystematicallyunderestimates ∑ ( y ∗ i − ˆ y ∗ i ) 2 ,where y ∗ arenewobservations,notusedinmodeldevelopment Training/testsetapproach:splitdataintwoparts Trainingdata:usedtodevelopmodel,usually50%,80%or90%of dataset Testset:usedtoassesspredictionaccuracy Wantalargetrainingdataset(togetagoodmodel)andalarge testset(togetapreciseestimateofrMSEP) c 2011Dept.Statistics(IowaStateUniversity) Stat511section31 5/26 Crossvalidationgetsthebestofboth. leave-one-outcv:fitmodelwithoutobs i ,usethatmodelto compute ˆ y i 10-foldcv:sameidea,blocksof N / 10 observations Canbeusedtochooseasmoothingparameter Find λ 2 thatminimizescvpredictionerror CV ( λ 2 )= n i = 1 y i − ˆ f − i ( x i ; λ 2 ) 2 , where ˆ f − i ( x i ; λ 2 ) isthepredictedvalueof y i usingapenalized linearsplinefunctionestimatedwithsmoothingparameter λ 2 from thedatasetthatexcludesthe i th observation. Find λ 2 valuethatminimizes CV ( λ 2 ) .Perhapscompute CV ( λ 2 ) for agridof λ 2 values Requiresa LOT ofcomputing(eachobs,many λ 2 ) c 2011Dept.Statistics(IowaStateUniversity) Stat511section31 6/26 Approximationto CV ( λ 2...
View Full Document

{[ snackBarMessage ]}

### Page1 / 7

31Spline2 - Smoothing-part2 Nextpage: parameters ∼...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online