30 CHAPTER 3 Least Squares
where
u=yXdzc is the vector of residuals when y is regressed on X and z. Note that unless Xz = 0,
dwillnotequalb= (XX)1Xy.(SeeSection8.2.1.)Moreover,unlessc =0,uwillnot equal e = yXb. Now we
have shown in Corollary 3.3.1 that c

y=X + =X11 +X22 +. What is the algebraic solution for b2? Thenormal equations are (1) (2) X 1X1 X
1X2 X 2X1 X 2X2
b1 b2
=
X 1y X 2y
. (3-17) A solution can be obtained by using the partitioned inverse matrix of (A-74). Alternatively, (1) and
(2) in (3-17)

Then
q =vv=
n i=1
v2 i , where v=Xc.
Unlesseveryelementofviszero,q ispositive.Butifvcouldbezero,thenvwouldbea
linearcombinationofthecolumnsofXthatequals0,whichcontradictstheassumption that X has full rank.
Since c is arbitrary, q is positive for every non

a set of two equations: b2
i(T i T )2 + b3
i(T i T )(Gi G) =
i(T i T )(Gi G) + b3 i(Gi G)2 = i(Gi G)(Y i Y).
i(T i T )(Y i Y), b2
(3-8)
This result shows the nature of the solution for the slopes, which can be computed from the sums of
squares and cross p

This result comes at a cost, however. The parameter estimates become progressively less precise as we
do so. We will pursue this result in Chapter 4. 4This measure is sometimes advocated on the basis of the
unbiasedness of the two quantities in the fracti

LEAST SQUARES REGRESSION The unknown parameters of the stochastic relation yi =x i + i are the
objects of estimation.Itisnecessarytodistinguishbetweenpopulationquantities,suchas andi,
andsampleestimatesofthem,denotedbandei.ThepopulationregressionisE[yi |x

THEOREM 3.7 Change in R2 When a Variable Is Added to a Regression Inamultipleregression, R2
willfall(rise)whenthevariable x isdeletedfromthe regression if the t ratio associated with this variable is
greater (less) than 1.
We have shown that R2 will never

CHAPTER 3 Least Squares
y
e
a bx
E(y) x
x
x
y a bx
FIGURE 3.1 Population and Sample Regression.
Although numerous candidates have been suggested, the one used most frequently is least squares.1
3.2.1 THE LEAST SQUARES COEFFICIENT VECTOR The least squares

noting is the signs of the coefcients. The signs of the partial correlation coefcients are the same as the
signs of the respective regression coefcients, three of which are negative. All the simple correlation
coefcients are positive because of the latent

easily show that M is both symmetric (M=M) and idempotent (M=M2). In view of (313),wecaninterpretMasamatrixthatproducesthevectorofleastsquaresresiduals in the regression of y on X
when it premultiplies any vector y. (It will be convenient later on to refe

in lny. The latter R2 will typically be larger, but this does not imply that the loglinear model is a better t
in some absolute sense. It is worth emphasizing that R2 is a measure of linear association between x and
y. For example, the third panel of Figu

COROLLARY 3.3.2 Regression with a Constant Term The slopes in a multiple regression that contains a
constant term are obtained by transforming the data to deviations from their means and then regressing
the variable yindeviationformontheexplanatoryvariabl

CHAPTER 3 Least Squares 29
controlling for the effect of age, is obtained as follows: 1. y =the residuals in a regression of income on
a constant and age. 2. z =the residuals in a regression of education on a constant and age. 3. The
partial correlationr

CHAPTER 3 Least Squares 25
the vector y into the column space of X. (See Sections A3.5 and A3.7.) By multiplying
itout,youwillndthat,likeM,Pissymmetricandidempotent.Giventheearlierresults, it also follows that M
and P are orthogonal; PM=MP=0. Finally, as

or
S(b0) =yy2yXb0 +b0XXb0. The necessary condition for a minimum is S(b0) b0 = 2Xy+2XXb0 =0. (3-4)
1We shall have to establish that the practical approach of tting the line as closely as possible to the data
by leastsquaresleadstoestimateswithgoodstatisti

uu=ee (z
y)2 (z z) =ee
1r2 yz
, (3-28)
where r yz is the partial correlation between y and z, controlling for X. Now divide
throughbothsidesoftheequalitybyyM0y.From(3-26),uu/yM0yis (1R2 Xz) forthe regression on X and z and
ee/yM0y is (1 R2 X). Rearrangin

CHAPTER 3 Least Squares 33
or
SST=SSR+SSE. (3-25) (Note that this is precisely the partitioning that appears at the end of Section 3.2.4.)
We can now obtain a measure of how well the regression line ts the data by using the
coefcient of determination:
SSR

0s and add a variable to the model that takes the value 1 for that one observation and 0 for all other
observations. Show that this strategy is equivalent to discarding the observation as regards the
computation of b but it does have an effect on
R2.Consi

bXM0Xb= yM0 y,
but y = Xb,y = y+e,M0e = e, and Xe = 0, so yM0 y = yM0y. Multiply R2 = yM0 y/yM0y=
yM0y/yM0y by 1= yM0y/ yM0 y to obtain R2 = [
i(yi y)( yi y)]2 [ i(yi y)2][
i( yi
y)2] , (3-27) which is the squared correlation between the observed value

The solution is b= (XX)1Xy= (0.50907,0.01658,0.67038,0.002326,0.00009401).
3.2.3 ALGEBRAIC ASPECTS OF THE LEAST SQUARES SOLUTION The normal equations are
XXbXy=X(yXb) =Xe=0. (3-12) Hence, for every column xk of X,x ke = 0. If the rst column of X is a
colu

CHAPTER 3 Least Squares 37
bypass these difculties by reporting a third R2, the squared sample correlation between the actual
values of y and the tted values from the regression. This approach could be deceptive. If the regression
contains a constant term

THEOREM 3.1 Orthogonal Regression If the variables in a multiple regression are not correlated (i.e., are
orthogonal), then the multiple regression slopes are the same as the slopes in the individual simple
regressions.
Inpractice,youwillneveractuallycomp

CHAPTER 3 Least Squares 23
Now divide both the numerator and denominator in the expression for b3 by
it2 i
ig2 i . By
manipulating it a bit and using the denition of the sample correlation between G and T,r2 gt = ( igiti)2/(
ig2 i
it2 i ), and dening byt

3.6 SUMMARY AND CONCLUSIONS
Thischapterhasdescribedthepurelyalgebraicexerciseofttingaline(hyperplane)to a set of points using the
method of least squares. We considered the primary problem rst,usingadatasetofnobservationson
Kvariables.Wethenexaminedsevera

TABLE 3.4 Analysis of Variance for the Investment Equation Source Degrees of Freedom Mean Square
Regression 0.0159025 4 0.003976 Residual 0.0004508 10 0.00004508 Total 0.016353 14 0.0011681 R2
=0.0159025/0.016353=0.97245.
3.5.1 THE ADJUSTED R-SQUARED AND

R2 1,2 = R2 1 +
1 R2 1
r2 y21. [This is the multivariate counterpart to (3-29).] Therefore, it is possible to push R2 as high as
desired just by adding regressors. This possibility motivates the use of the adjusted R-squared in (3-30),
instead of R2 as a

IntermsofExample2.2,wecouldobtainthecoefcientoneducationinthemultiple regression by rst
regressing earnings and education on age (or age and age squared) and then using the residuals from
these regressions in a simple regression. In a classic application

y
y
y i
yi y
xi x
x xi
b(xi x )
FIGURE 3.4 Decomposition of yi.
In terms of the regression equation, we may write the full set of observations as y=Xb+e= y+e. (3-24)
For an individual observation, we have yi = yi +ei =x ib+ei. If the regression contains

Thereareothercandidatesforestimating.Inatwo-variablecase,forexample,we might use the intercept, a,
and slope, b, of the line between the points with the largest
andsmallestvaluesofx.Alternatively,wemightndtheaandbthatminimizethesum
ofabsolutevaluesofthere

3.5.3 COMPARING MODELS The value of R2 we obtained for the consumption function in Example 3.2
seems high in an absolute sense. Is it? Unfortunately, there is no absolute basis for comparison. In fact,
in using aggregate time-series data, coefcients of de

only transform X? 5. Residualmakers.WhatistheresultofthematrixproductM1MwhereM1 isdened in (319) and M is dened in (3-14)? 6. Adding an observation. A data set consists of n observations on Xn and
yn. The least squares estimator based on these n observati

Thus,M1X2 isamatrixofresiduals;eachcolumnofM1X2 isavectorofresidualsinthe regression of the
corresponding column of X2 on the variables in X1. By exploiting the fact that M1, like M, is idempotent,
we can rewrite (3-19) as b2 = (X 2 X 2)1X 2 y, (3-20) whe

[IX(XX)1X]z is the vector of residuals when z is regressed on X.
Returning to our derivation, we note that ee=y
y and c2(z
z) = (z
y)2/(z
z).
Therefore, (uu)/(y
y) =1r2 yz, and we have our result. Example 3.1 Partial Correlations
Forthedatatheapplica

38 CHAPTER 3 Least Squares
in computing R2 for the loglinear model. Only in the case of least squares applied to a linear equation
with a constant term can R2 be interpreted as the proportion of variation in y explained by variation in x.
An analogous com