lec2 - Least-Squares Estimation: Recall that the projection...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Least-Squares Estimation: Recall that the projection of y onto C ( X ), the set of all vectors of the form Xb for b R k +1 , yields the closest point in C ( X ) to y . That is, p ( y | C ( X )) yields the minimizer of Q ( ) = k y- X k 2 (the least squares criterion) This leads to the estimator given by the solution of X T X = X T y (the normal equations) or = ( X T X )- 1 X T y . All of this has already been established back when we studied projections (see pp. 3031). Alternatively, we could use calculus: To find a stationary point (maximum, minimum, or saddle point) of Q ( ), we set the partial derivative of Q ( ) equal to zero and solve: Q ( ) = ( y- X ) T ( y- X ) = ( y T y- 2 y T X + T ( X T X ) ) =- 2 X T y + 2 X T X Here weve used the vector differentiation formulas z c T z = c and z z T Az = 2 Az (see 2.14 of our text). Setting this result equal to zero, we obtain the normal equations, which has solution = ( X T X )- 1 X T y . That this is a minimum rather than a max, or saddle point can be verified by checking the second derivative matrix of Q ( ): 2 Q ( ) = 2 X T X which is positive definite (result 7, p. 54), therefore is a minimum. 101 Example Simple Linear Regression Consider the case k = 1: y i = + 1 x i + e i , i = 1 , . . . , n where e 1 , . . . , e n are i.i.d. each with mean 0 and variance 2 . Then the model equation becomes y 1 y 2 . . . y n = 1 x 1 1 x 2 . . . . . . 1 x n | {z } = X 1 | {z } = + e 1 e 2 . . . e n . It follows that X T X = n i x i i x i i x 2 i , X T y = i y i i x i y i ( X T X )- 1 = 1 n i x 2 i- ( i x i ) 2 i x 2 i- i x i- i x i n . Therefore, = ( X T X )- 1 X T y yields = 1 = 1 n i x 2 i- ( i x i ) 2 ( i x 2 i )( i y i )- ( i x i )( i x i y i )- ( i x i )( i y i ) + n i x i y i . After a bit of algebra, these estimators simplify to 1 = i ( x i- x )( y i- y ) i ( x i- x ) 2 = S xy S xx and = y- 1 x 102 In the case that X is of full rank, and are given by = ( X T X )- 1 X T y , = X = X ( X T X )- 1 X T y = P C ( X ) y . Notice that both and are linear functions of y . That is, in each case the estimator is given by some matrix times y . Note also that = ( X T X )- 1 X T y = ( X T X )- 1 X T ( X + e ) = + ( X T X )- 1 X T e . From this representation several important properties of the least squares estimator follow easily: 1. (unbiasedness): E( ) = E( + ( X T X )- 1 X T e ) = + ( X T X )- 1 X T E( e ) | {z } = = ....
View Full Document

Page1 / 100

lec2 - Least-Squares Estimation: Recall that the projection...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online