# The assumptions on the errors in this model can also

• Notes
• jakejake2013
• 58
• 90% (10) 9 out of 10 people found this document helpful

This preview shows page 5 - 7 out of 58 pages.

The assumptions on the errors in this model can also be written in vector form. We write N ( 0 2 I ) , a multivariate normal distribution with mean vector E ( ) = 0 and covariance matrix V ( ) = σ 2 I . Similarly, we write y N ( X β 2 I ) , a multivariate normal distribution with mean vector E ( y ) = X β and covariance matrix V ( y ) = σ 2 I . 4.2 ESTIMATION OF THE MODEL We now consider the estimation of the unknown parameters: the ( p + 1 ) re- gression parameters β , and the variance of the errors σ 2 . Since y i N i 2 ) with µ i = β 0 + β 1 x i 1 + ··· + β p x ip are independent, it is straightforward to write down the joint probability density p ( y 1 ,..., y n | β 2 ) . Treating this, for given data y , as a function of the parameters leads to the likelihood function L ( β 2 | y 1 ,..., y n ) = ( 1 / 2 πσ) n exp n i = 1 ( y i µ i ) 2 / 2 σ 2 (4.8) Maximizing the likelihood function L with respect to β is equivalent to minimiz- ing S ( β ) = n i = 1 ( y i µ i ) 2 with respect to β . This is because the exponent in Eq. (4.8) is the only term containing β . The sum of squares S ( β ) can be written in vector notation, S ( β ) = ( y µ ) ( y µ ) = ( y X β ) ( y X β ), since µ = X β (4.9)
Abraham Abraham ˙ C04 November 8, 2004 1:29 4.2 Estimation of the Model 91 The minimization of S ( β ) with respect to β is known as least squares estimation , and for normal errors it is equivalent to maximum likelihood estimation. We determinetheleastsquaresestimatesbyobtainingthe fi rstderivativesof S ( β ) with respect to the parameters β 0 1 ,...,β p , and by setting these ( p + 1 ) derivatives equal to zero. The appendix shows that this leads to the ( p + 1 ) equations X X ˆ β = X y (4.10) These equations are referred to as the normal equations . The matrix X is assumed to have full column rank p + 1. Hence, the ( p + 1 ) × ( p + 1 ) matrix X X is nonsingular and the solution of Eq. (4.10) is given by ˆ β = ( X X ) 1 X y (4.11) The estimate ˆ β in Eq. (4.11) minimizes S ( β ) , and is known as the least squares estimate (LSE) of β . 4.2.1 A GEOMETRIC INTERPRETATION OF LEAST SQUARES The model in Eq. (4.7) can be written as y = β 0 1 + β 1 x 1 + ··· + β p x p + = µ + (4.12) where the ( n × 1 ) vectors y and are as de fi ned before, and the ( n × 1 ) vec- tors 1 = ( 1 , 1 ,..., 1 ) and x j = ( x 1 j , x 2 j ,..., x nj ) , for j = 1 , 2 ,..., p , represent the columns of the matrix X . Thus, X = ( 1 , x 1 ,..., x p ) and µ = X β = β 0 1 + β 1 x 1 + ··· + β p x p . The representation in Eq. (4.12) shows that the deterministic component µ is a linear combination of the vectors 1 , x 1 ,..., x p . Let L ( 1 , x 1 ,..., x p ) be the set of all linear combinations of these vectors. If we assume that these vectors are not linearly dependent, L ( X ) = L ( 1 , x 1 ,..., x p ) is a subspace of R n of dimension p + 1. Note that the assumption that 1 , x 1 ,..., x p are not linearly dependent is the same as saying that X has rank p + 1. We want to explain these concepts slowly because they are essential for under- standing the geometric interpretation that follows. First, note that the dimension of the regressor vectors 1 , x 1 ,..., x p is n , the number of cases. When we display the ( p + 1 ) regressor vectors, we do that in n -dimensional Euclidean space R n .