This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: LeastSquares Estimation: Recall that the projection of y onto C ( X ), the set of all vectors of the form Xb for b R k +1 , yields the closest point in C ( X ) to y . That is, p ( y  C ( X )) yields the minimizer of Q ( ) = k y X k 2 (the least squares criterion) This leads to the estimator given by the solution of X T X = X T y (the normal equations) or = ( X T X ) 1 X T y . All of this has already been established back when we studied projections (see pp. 3031). Alternatively, we could use calculus: To find a stationary point (maximum, minimum, or saddle point) of Q ( ), we set the partial derivative of Q ( ) equal to zero and solve: Q ( ) = ( y X ) T ( y X ) = ( y T y 2 y T X + T ( X T X ) ) = 2 X T y + 2 X T X Here weve used the vector differentiation formulas z c T z = c and z z T Az = 2 Az (see 2.14 of our text). Setting this result equal to zero, we obtain the normal equations, which has solution = ( X T X ) 1 X T y . That this is a minimum rather than a max, or saddle point can be verified by checking the second derivative matrix of Q ( ): 2 Q ( ) = 2 X T X which is positive definite (result 7, p. 54), therefore is a minimum. 101 Example Simple Linear Regression Consider the case k = 1: y i = + 1 x i + e i , i = 1 , . . . , n where e 1 , . . . , e n are i.i.d. each with mean 0 and variance 2 . Then the model equation becomes y 1 y 2 . . . y n = 1 x 1 1 x 2 . . . . . . 1 x n  {z } = X 1  {z } = + e 1 e 2 . . . e n . It follows that X T X = n i x i i x i i x 2 i , X T y = i y i i x i y i ( X T X ) 1 = 1 n i x 2 i ( i x i ) 2 i x 2 i i x i i x i n . Therefore, = ( X T X ) 1 X T y yields = 1 = 1 n i x 2 i ( i x i ) 2 ( i x 2 i )( i y i ) ( i x i )( i x i y i ) ( i x i )( i y i ) + n i x i y i . After a bit of algebra, these estimators simplify to 1 = i ( x i x )( y i y ) i ( x i x ) 2 = S xy S xx and = y 1 x 102 In the case that X is of full rank, and are given by = ( X T X ) 1 X T y , = X = X ( X T X ) 1 X T y = P C ( X ) y . Notice that both and are linear functions of y . That is, in each case the estimator is given by some matrix times y . Note also that = ( X T X ) 1 X T y = ( X T X ) 1 X T ( X + e ) = + ( X T X ) 1 X T e . From this representation several important properties of the least squares estimator follow easily: 1. (unbiasedness): E( ) = E( + ( X T X ) 1 X T e ) = + ( X T X ) 1 X T E( e )  {z } = = ....
View Full
Document
 Summer '10
 HALL
 Least Squares

Click to edit the document details