{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

smoo - 5601 Notes Smoothing Charles J Geyer April 8 2006...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
5601 Notes: Smoothing Charles J. Geyer April 8, 2006 Contents 1 Web Pages 2 2 The General Smoothing Problem 2 3 Some Smoothers 4 3.1 Running Mean Smoother . . . . . . . . . . . . . . . . . . . . 4 3.2 General Kernel Smoothing . . . . . . . . . . . . . . . . . . . . 5 3.3 Local Polynomial Smoothing . . . . . . . . . . . . . . . . . . 10 3.4 Smoothing Splines . . . . . . . . . . . . . . . . . . . . . . . . 11 4 Some Theory 17 4.1 Linear Smoothers . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Distribution Theory . . . . . . . . . . . . . . . . . . . . . . . 19 4.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . 19 4.2.2 Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2.3 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2.4 Variance Estimate . . . . . . . . . . . . . . . . . . . . 21 4.2.5 Degrees of Freedom . . . . . . . . . . . . . . . . . . . 22 4.3 Performance Criteria . . . . . . . . . . . . . . . . . . . . . . . 23 4.3.1 Mean Squared Error . . . . . . . . . . . . . . . . . . . 23 4.3.2 Mallows’s C p . . . . . . . . . . . . . . . . . . . . . . . 25 4.3.3 Cross Validation . . . . . . . . . . . . . . . . . . . . . 25 4.3.4 Leave One Out . . . . . . . . . . . . . . . . . . . . . . 26 4.3.5 Cross Validation Revisited . . . . . . . . . . . . . . . . 27 4.3.6 Generalized Cross Validation . . . . . . . . . . . . . . 27 5 The Bias-Variance Trade-off 28 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
1 Web Pages This handout accompanies the web pages http://www.stat.umn.edu/geyer/5601/examp/smoo.html http://www.stat.umn.edu/geyer/5601/examp/smootoo.html 2 The General Smoothing Problem In simple linear regression, the standard assumptions are that the data are of the form ( x i , y i ), i = 1, . . . , n . We are interested in being able to predict y i values given the corresponding x i values. For this reason we treat x i as non-random. If the x i are actually random, we say we are conditioning on their observed values, which is the same thing as treating them as non- random. The conditional distribution of the y i given the x i is determined by y i = α + βx i + e i (1) where α and β are unknown parameters (non-random but unknown con- stants) and the e i are IID mean zero normal random variables. More generally, using multiple linear regression, we can generalize the model (1) to y i = α + β 1 g 1 ( x i ) + · · · + β k g k ( x i ) + e i (2) where g 1 , . . . , g k are any known functions and the errors e i are as before. For example, polynomial regression is the case where the g i are mono- mials g i ( x ) = x i , i = 1 , . . . , k. But multiple regression works with any functions g i so long as they are known not estimated , that is, chosen by the data analyst without looking at the data rather than somehow estimated from the data (only the regression parameters α , β 1 , . . . , β k are estimated from the data). Even more generally (using we don’t yet know what) we can generalize the model (2) to y i = g ( x i ) + e i (3) where g is an unknown function and the errors e i are as before. Unlike the jump from (1) to (2), which involves only a quantitative change from 2 to k + 1 regression coefficients, the jump from (2) to (3) involves a qualitative change from k +1 real parameters α , β 1 , . . . , β k to an unknown “parameter” that is a whole function g . 2
Background image of page 2
Theoretical statisticians often call such a g an infinite-dimensional pa- rameter because no finite-dimensional parameter vector θ can parameterize all possible functions, that is, we cannot write the function g ( x ) as g θ ( x ) for some finite-dimensional parameter θ . In particular, we cannot write g θ ( x ) = α + β 1 g 1 ( x ) + · · · + β k g k ( x ) where θ = ( α, β 1 , . . . , β k ) where g 1 , . . . , g k are known functions. If we could to that, this would reduce (3) to a special case of (2). But we can’t do that, so (3) is not a special case of (2).
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}