Week6Student2009

# Week6Student2009 - Week 6 Lecture 11 Kernel regression and...

This preview shows pages 1–4. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Week 6 Lecture 11 Kernel regression and minimax rates Model : Observe Y i = f ( X i ) + ξ i where ξ i , i = 1 , 2 ,...,n, i.i.d. with Eξ i = 0. We often assume X i are i.i.d. or X i = i/n . Nadaraya–Watson estimator Let w i ( x ) = K ( X i- x h ) . The Nadaraya-Watson estimator is ˆ f NW n ( x ) = ( P n i =1 Y i w i P n i =1 w i , ∑ n i =1 w i 6 = 0 , otherwise . Intuition: Let p ( x,y ) be the joint density of ( X,Y ) and p ( x ) be the marginal density of X . Then f ( x ) = E ( Y 1 | X 1 = x ) = R yp ( x,y ) dy p ( x ) ≈ R y ˆ p h ( x,y ) dy ˆ p h ( x ) = 1 nh 2 ∑ n i =1 K ( X i- x h )R yK Y i- y h dy 1 nh ∑ n i =1 K ( X i- x h ) = ∑ n i =1 K ( X i- x h ) h R Y i h K Y i- y h dy- R Y i- y h K Y i- y h dy i ∑ n i =1 K ( X i- x h ) = ∑ n i =1 Y i K ( X i- x h ) ∑ n i =1 K ( X i- x h ) where the last identity follows from the fact that R K ( y ) = 1 and R yK ( y ) dy = 0. Remark: The Nadaraya-Watson estimator is a solution of the following min- imization problem ˆ f NW n ( x ) = arg min θ n X i =1 ( Y i- θ ) 2 w i ( x ) . Starting from this equation we may introduce the local polynomial method. Rate of convergence Assume that F = f, sup x f ( m ) ( x ) ≤ M , 0 < ≤ p ( x ) ≤ M 1 and Eξ 2 i < ∞ . We may show inf b f n sup f ∈F Z E b f n ( x )- f ( x ) 2 Cn- 2 m/ (2 m +1) . 1 Upper bound : But to simplify our calculations we may assume X i are i.i.d. U (0 , 1). Let ˆ f NW n ( x ) = 1 nh n X i =1 Y i K X i- x h . Bias: E ˆ f NW n ( x )- f ( x ) = 1 h Z f ( t ) K t- x h- f ( x ) ≤ C 1 h m Variance: V ar ˆ f NW n ( x ) = 1 n V ar 1 h Y i K X i- x h ≤ 1 n E 1 h Y i K X i- x h 2 ≤ 2 nh E 1 h ( f 2 ( X i ) + ξ 2 i ) K 2 X i- x h . Then sup f ∈F R E ˆ f NW n ( x )- f ( x ) 2 ≤ C 1 n- 2 m/ (2 m +1) Questions: • What about lower bound? Similarly to density estimation case. • Adaptive estimation? • Estimation under sup-norm? • Estimation at a point? Adaptive estimation? • ...... Very similar results to density estimation! Asymptotic equivalence General Theory: Le Cam (1986). Too hard to read!? Density estimation E n : Y 1 ,Y 2 ,...,Y n i.i.d. with density f ( y ), y ∈ [0 , 1] Poisson F n : Y 1 ,Y 2 ,...,Y N i.i.d. with density f ( y ), N ∼ Poisson ( n ), y ∈ [0 , 1] Gaussian white noise G n : dy ( t ) = f ( t ) dt + dW ( t ), = n- 1 / 2 2 Gaussian regression H n : y i = f ( i/n ) + z i , z i ∼ N (0 , 1). Spectral density estimation I n : Y 1 ,Y 2 ,...,Y n , a stationary centered Gaussian sequence with spectral density f and more models: exponential family regression, general location models . 1. Brown and Low (1996): G n and H n are asymptotically equivalent under the assumption f ∈ H¨ older( α,M ) with α > 1 / 2....
View Full Document

## This note was uploaded on 11/06/2009 for the course STAT 680 at Yale.

### Page1 / 10

Week6Student2009 - Week 6 Lecture 11 Kernel regression and...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online