This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: LeastSquares Estimation: Recall that the projection of y onto C ( X ), the set of all vectors of the form Xb for b ∈ R k +1 , yields the closest point in C ( X ) to y . That is, p ( y  C ( X )) yields the minimizer of Q ( β ) = k y X β k 2 (the least squares criterion) This leads to the estimator ˆ β given by the solution of X T X β = X T y (the normal equations) or ˆ β = ( X T X ) 1 X T y . All of this has already been established back when we studied projections (see pp. 30–31). Alternatively, we could use calculus: To find a stationary point (maximum, minimum, or saddle point) of Q ( β ), we set the partial derivative of Q ( β ) equal to zero and solve: ∂ ∂ β Q ( β ) = ∂ ∂ β ( y X β ) T ( y X β ) = ∂ ∂ β ( y T y 2 y T X β + β T ( X T X ) β ) = 2 X T y + 2 X T X β Here we’ve used the vector differentiation formulas ∂ ∂ z c T z = c and ∂ ∂ z z T Az = 2 Az (see § 2.14 of our text). Setting this result equal to zero, we obtain the normal equations, which has solution ˆ β = ( X T X ) 1 X T y . That this is a minimum rather than a max, or saddle point can be verified by checking the second derivative matrix of Q ( β ): ∂ 2 Q ( β ) ∂ β = 2 X T X which is positive definite (result 7, p. 54), therefore ˆ β is a minimum. 101 Example — Simple Linear Regression Consider the case k = 1: y i = β + β 1 x i + e i , i = 1 , . . . , n where e 1 , . . . , e n are i.i.d. each with mean 0 and variance σ 2 . Then the model equation becomes y 1 y 2 . . . y n = 1 x 1 1 x 2 . . . . . . 1 x n  {z } = X β β 1 ¶  {z } = β + e 1 e 2 . . . e n . It follows that X T X = n ∑ i x i ∑ i x i ∑ i x 2 i ¶ , X T y = ∑ i y i ∑ i x i y i ¶ ( X T X ) 1 = 1 n ∑ i x 2 i ( ∑ i x i ) 2 ∑ i x 2 i ∑ i x i ∑ i x i n ¶ . Therefore, ˆ β = ( X T X ) 1 X T y yields ˆ β = ˆ β ˆ β 1 ¶ = 1 n ∑ i x 2 i ( ∑ i x i ) 2 ( ∑ i x 2 i )( ∑ i y i ) ( ∑ i x i )( ∑ i x i y i ) ( ∑ i x i )( ∑ i y i ) + n ∑ i x i y i ¶ . After a bit of algebra, these estimators simplify to ˆ β 1 = ∑ i ( x i ¯ x )( y i ¯ y ) ∑ i ( x i ¯ x ) 2 = S xy S xx and ˆ β = ¯ y ˆ β 1 ¯ x 102 In the case that X is of full rank, ˆ β and ˆ μ are given by ˆ β = ( X T X ) 1 X T y , ˆ μ = X ˆ β = X ( X T X ) 1 X T y = P C ( X ) y . • Notice that both ˆ β and ˆ μ are linear functions of y . That is, in each case the estimator is given by some matrix times y . Note also that ˆ β = ( X T X ) 1 X T y = ( X T X ) 1 X T ( X β + e ) = β + ( X T X ) 1 X T e . From this representation several important properties of the least squares estimator ˆ β follow easily: 1. (unbiasedness): E( ˆ β ) = E( β + ( X T X ) 1 X T e ) = β + ( X T X ) 1 X T E( e )  {z } = = β ....
View
Full
Document
This note was uploaded on 11/13/2011 for the course STAT 8260 taught by Professor Hall during the Summer '10 term at UGA.
 Summer '10
 HALL
 Least Squares

Click to edit the document details