Unformatted text preview: Smoothing-part2 Nextpage:fittedpenalizedregressionsplinesfor3smoothing parameters: ∼ 0,100,and5.7 5.7isthe“optimal”choice,tobediscussedshortly “optimal”curveisasequenceofstraightlines continuous,but1stderivativeisnotcontinuous Smoothedfitslook“smoother”ifcontinuousin1stderivativeand in2ndderivative Suggestsjoiningtogethercubicpieceswithappropriate constraintsonthepiecessothatthe1stand2ndderivativesare continuous Manyveryslightlydifferentapproaches cubicregressionsplines(cubicsmoothingsplines) thinplatesplines c 2011Dept.Statistics(IowaStateUniversity) Stat511section31 1/26 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 1 1 5 3 .0 3 .5 4 .0 4 .5 5 .0 5 .5 6 .0 6 .5 Age of diagnosis lo g C- p e p t id e c o n c e n t r a t io n ~0 100 5.7 c 2011Dept.Statistics(IowaStateUniversity) Stat511section31 2/26 We’lltalkaboutthinplatesplinesbecausetheyprovideaneasyto implementwaytofitmultiple X ’s E y = f ( x 1 , x 2 ) aswellasE y = f ( x 1 )+ f ( x 2 ) Thedegree3thinplatesplinewithknotsat ( k 1 , k 2 ,..., k K ) f ( x )= β + β 1 x + β 2 x 2 + K i = 1 u k | x − k i | 5 0.0 0.2 0.4 0.6 0.8 1.0 . . 2 .4 .6 .8 1 .0 c 2011Dept.Statistics(IowaStateUniversity) Stat511section31 3/26 Howmuchtosmooth? i.e.what λ 2 ?orwhat u k ’s reminder:0 ⇒ nosmoothing(linearorquadraticintps) large ⇒ closefittodatapoints We’lltalkaboutthreeapproaches: 1. Crossvalidation 2. Generalizedcrossvalidation 3. Mixedmodels c 2011Dept.Statistics(IowaStateUniversity) Stat511section31 4/26 Crossvalidation Generalmethodtoestimate“outofsample”predictionerror Concept:Developamodel,wanttoassesshowwellitpredicts MightuserMSEP ∑ ( y i − ˆ y i ) 2 asacriterion. Problem:datausedtwice,oncetodevelopmodelandagainto assesspredictionaccuracy rMSEPsystematicallyunderestimates ∑ ( y ∗ i − ˆ y ∗ i ) 2 ,where y ∗ arenewobservations,notusedinmodeldevelopment Training/testsetapproach:splitdataintwoparts Trainingdata:usedtodevelopmodel,usually50%,80%or90%of dataset Testset:usedtoassesspredictionaccuracy Wantalargetrainingdataset(togetagoodmodel)andalarge testset(togetapreciseestimateofrMSEP) c 2011Dept.Statistics(IowaStateUniversity) Stat511section31 5/26 Crossvalidationgetsthebestofboth. leave-one-outcv:fitmodelwithoutobs i ,usethatmodelto compute ˆ y i 10-foldcv:sameidea,blocksof N / 10 observations Canbeusedtochooseasmoothingparameter Find λ 2 thatminimizescvpredictionerror CV ( λ 2 )= n i = 1 y i − ˆ f − i ( x i ; λ 2 ) 2 , where ˆ f − i ( x i ; λ 2 ) isthepredictedvalueof y i usingapenalized linearsplinefunctionestimatedwithsmoothingparameter λ 2 from thedatasetthatexcludesthe i th observation. Find λ 2 valuethatminimizes CV ( λ 2 ) .Perhapscompute CV ( λ 2 ) for agridof λ 2 values Requiresa LOT ofcomputing(eachobs,many λ 2 ) c 2011Dept.Statistics(IowaStateUniversity) Stat511section31 6/26 Approximationto CV ( λ 2...
