This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Week 6 Lecture 11 Kernel regression and minimax rates Model : Observe Y i = f ( X i ) + ξ i where ξ i , i = 1 , 2 ,...,n, i.i.d. with Eξ i = 0. We often assume X i are i.i.d. or X i = i/n . Nadaraya–Watson estimator Let w i ( x ) = K ( X i x h ) . The NadarayaWatson estimator is ˆ f NW n ( x ) = ( P n i =1 Y i w i P n i =1 w i , ∑ n i =1 w i 6 = 0 , otherwise . Intuition: Let p ( x,y ) be the joint density of ( X,Y ) and p ( x ) be the marginal density of X . Then f ( x ) = E ( Y 1  X 1 = x ) = R yp ( x,y ) dy p ( x ) ≈ R y ˆ p h ( x,y ) dy ˆ p h ( x ) = 1 nh 2 ∑ n i =1 K ( X i x h )R yK Y i y h dy 1 nh ∑ n i =1 K ( X i x h ) = ∑ n i =1 K ( X i x h ) h R Y i h K Y i y h dy R Y i y h K Y i y h dy i ∑ n i =1 K ( X i x h ) = ∑ n i =1 Y i K ( X i x h ) ∑ n i =1 K ( X i x h ) where the last identity follows from the fact that R K ( y ) = 1 and R yK ( y ) dy = 0. Remark: The NadarayaWatson estimator is a solution of the following min imization problem ˆ f NW n ( x ) = arg min θ n X i =1 ( Y i θ ) 2 w i ( x ) . Starting from this equation we may introduce the local polynomial method. Rate of convergence Assume that F = f, sup x f ( m ) ( x ) ≤ M , 0 < ≤ p ( x ) ≤ M 1 and Eξ 2 i < ∞ . We may show inf b f n sup f ∈F Z E b f n ( x ) f ( x ) 2 Cn 2 m/ (2 m +1) . 1 Upper bound : But to simplify our calculations we may assume X i are i.i.d. U (0 , 1). Let ˆ f NW n ( x ) = 1 nh n X i =1 Y i K X i x h . Bias: E ˆ f NW n ( x ) f ( x ) = 1 h Z f ( t ) K t x h f ( x ) ≤ C 1 h m Variance: V ar ˆ f NW n ( x ) = 1 n V ar 1 h Y i K X i x h ≤ 1 n E 1 h Y i K X i x h 2 ≤ 2 nh E 1 h ( f 2 ( X i ) + ξ 2 i ) K 2 X i x h . Then sup f ∈F R E ˆ f NW n ( x ) f ( x ) 2 ≤ C 1 n 2 m/ (2 m +1) Questions: • What about lower bound? Similarly to density estimation case. • Adaptive estimation? • Estimation under supnorm? • Estimation at a point? Adaptive estimation? • ...... Very similar results to density estimation! Asymptotic equivalence General Theory: Le Cam (1986). Too hard to read!? Density estimation E n : Y 1 ,Y 2 ,...,Y n i.i.d. with density f ( y ), y ∈ [0 , 1] Poisson F n : Y 1 ,Y 2 ,...,Y N i.i.d. with density f ( y ), N ∼ Poisson ( n ), y ∈ [0 , 1] Gaussian white noise G n : dy ( t ) = f ( t ) dt + dW ( t ), = n 1 / 2 2 Gaussian regression H n : y i = f ( i/n ) + z i , z i ∼ N (0 , 1). Spectral density estimation I n : Y 1 ,Y 2 ,...,Y n , a stationary centered Gaussian sequence with spectral density f and more models: exponential family regression, general location models . 1. Brown and Low (1996): G n and H n are asymptotically equivalent under the assumption f ∈ H¨ older( α,M ) with α > 1 / 2....
View
Full
Document
This note was uploaded on 11/06/2009 for the course STAT 680 at Yale.
 '09
 HarrisonH.Zhou

Click to edit the document details