Populism in America
In recent years populism has been a growing global phenomenon. Concerns have been raised that
democracy might be threatened in the most praised democracy of all: the United States of America. Since
the 2008 financial crisis populism in

in lny. The latter R2 will typically be larger, but this does not imply that the loglinear model is a better t
in some absolute sense. It is worth emphasizing that R2 is a measure of linear association between x and
y. For example, the third panel of Figu

COROLLARY 3.3.2 Regression with a Constant Term The slopes in a multiple regression that contains a
constant term are obtained by transforming the data to deviations from their means and then regressing
the variable yindeviationformontheexplanatoryvariabl

CHAPTER 3 Least Squares 29
controlling for the effect of age, is obtained as follows: 1. y =the residuals in a regression of income on
a constant and age. 2. z =the residuals in a regression of education on a constant and age. 3. The
partial correlationr

CHAPTER 3 Least Squares 25
the vector y into the column space of X. (See Sections A3.5 and A3.7.) By multiplying
itout,youwillndthat,likeM,Pissymmetricandidempotent.Giventheearlierresults, it also follows that M
and P are orthogonal; PM=MP=0. Finally, as

or
S(b0) =yy2yXb0 +b0XXb0. The necessary condition for a minimum is S(b0) b0 = 2Xy+2XXb0 =0. (3-4)
1We shall have to establish that the practical approach of tting the line as closely as possible to the data
by leastsquaresleadstoestimateswithgoodstatisti

uu=ee (z
y)2 (z z) =ee
1r2 yz
, (3-28)
where r yz is the partial correlation between y and z, controlling for X. Now divide
throughbothsidesoftheequalitybyyM0y.From(3-26),uu/yM0yis (1R2 Xz) forthe regression on X and z and
ee/yM0y is (1 R2 X). Rearrangin

CHAPTER 3 Least Squares 33
or
SST=SSR+SSE. (3-25) (Note that this is precisely the partitioning that appears at the end of Section 3.2.4.)
We can now obtain a measure of how well the regression line ts the data by using the
coefcient of determination:
SSR

0s and add a variable to the model that takes the value 1 for that one observation and 0 for all other
observations. Show that this strategy is equivalent to discarding the observation as regards the
computation of b but it does have an effect on
R2.Consi

bXM0Xb= yM0 y,
but y = Xb,y = y+e,M0e = e, and Xe = 0, so yM0 y = yM0y. Multiply R2 = yM0 y/yM0y=
yM0y/yM0y by 1= yM0y/ yM0 y to obtain R2 = [
i(yi y)( yi y)]2 [ i(yi y)2][
i( yi
y)2] , (3-27) which is the squared correlation between the observed value

easily show that M is both symmetric (M=M) and idempotent (M=M2). In view of (313),wecaninterpretMasamatrixthatproducesthevectorofleastsquaresresiduals in the regression of y on X
when it premultiplies any vector y. (It will be convenient later on to refe

noting is the signs of the coefcients. The signs of the partial correlation coefcients are the same as the
signs of the respective regression coefcients, three of which are negative. All the simple correlation
coefcients are positive because of the latent

CHAPTER 3 Least Squares
y
e
a bx
E(y) x
x
x
y a bx
FIGURE 3.1 Population and Sample Regression.
Although numerous candidates have been suggested, the one used most frequently is least squares.1
3.2.1 THE LEAST SQUARES COEFFICIENT VECTOR The least squares

Mikhail Khodorkovsky, once the richest man in Russia with an estimated net
worth of fifteen billion dollars, now lives in Switzerland after serving a 10-year prison
sentence with his fortune mostly gone. In 1996 Khodorovsky acquired the oil
conglomerate Y

30 CHAPTER 3 Least Squares
where
u=yXdzc is the vector of residuals when y is regressed on X and z. Note that unless Xz = 0,
dwillnotequalb= (XX)1Xy.(SeeSection8.2.1.)Moreover,unlessc =0,uwillnot equal e = yXb. Now we
have shown in Corollary 3.3.1 that c

y=X + =X11 +X22 +. What is the algebraic solution for b2? Thenormal equations are (1) (2) X 1X1 X
1X2 X 2X1 X 2X2
b1 b2
=
X 1y X 2y
. (3-17) A solution can be obtained by using the partitioned inverse matrix of (A-74). Alternatively, (1) and
(2) in (3-17)

Then
q =vv=
n i=1
v2 i , where v=Xc.
Unlesseveryelementofviszero,q ispositive.Butifvcouldbezero,thenvwouldbea
linearcombinationofthecolumnsofXthatequals0,whichcontradictstheassumption that X has full rank.
Since c is arbitrary, q is positive for every non

a set of two equations: b2
i(T i T )2 + b3
i(T i T )(Gi G) =
i(T i T )(Gi G) + b3 i(Gi G)2 = i(Gi G)(Y i Y).
i(T i T )(Y i Y), b2
(3-8)
This result shows the nature of the solution for the slopes, which can be computed from the sums of
squares and cross p

This result comes at a cost, however. The parameter estimates become progressively less precise as we
do so. We will pursue this result in Chapter 4. 4This measure is sometimes advocated on the basis of the
unbiasedness of the two quantities in the fracti

LEAST SQUARES REGRESSION The unknown parameters of the stochastic relation yi =x i + i are the
objects of estimation.Itisnecessarytodistinguishbetweenpopulationquantities,suchas andi,
andsampleestimatesofthem,denotedbandei.ThepopulationregressionisE[yi |x

THEOREM 3.7 Change in R2 When a Variable Is Added to a Regression Inamultipleregression, R2
willfall(rise)whenthevariable x isdeletedfromthe regression if the t ratio associated with this variable is
greater (less) than 1.
We have shown that R2 will never

The solution is b= (XX)1Xy= (0.50907,0.01658,0.67038,0.002326,0.00009401).
3.2.3 ALGEBRAIC ASPECTS OF THE LEAST SQUARES SOLUTION The normal equations are
XXbXy=X(yXb) =Xe=0. (3-12) Hence, for every column xk of X,x ke = 0. If the rst column of X is a
colu

CHAPTER 3 Least Squares 37
bypass these difculties by reporting a third R2, the squared sample correlation between the actual
values of y and the tted values from the regression. This approach could be deceptive. If the regression
contains a constant term

THEOREM 3.1 Orthogonal Regression If the variables in a multiple regression are not correlated (i.e., are
orthogonal), then the multiple regression slopes are the same as the slopes in the individual simple
regressions.
Inpractice,youwillneveractuallycomp

CHAPTER 3 Least Squares 39
d. Prove that these two values uniquely minimize the sum of squares by showing
thatthediagonalelementsofthesecondderivativesmatrixofthesumofsquares with respect to the
parameters are both positive and that the determinant is 4n[

the subject of GMM estimation.
4.2.2 MINIMUM MEAN SQUARED ERROR PREDICTOR
Asanalternativeapproach,considertheproblemofndinganoptimallinearpredictor for y. Once again, ignore
Assumption A6 and, in addition, drop Assumption A1 that theconditionalmeanfunctio

Key Terms and Concepts Adjusted R-squared Analysis of variance Bivariate regression Coefcient of
determination Disturbance Fitting criterion Frisch-Waugh theorem Goodness of t Least squares
Least squares normal equations Moment matrix Multiple correlation

Es = s +sY+sdPd +snPn +ssPs +s. Prove that if all equations are estimated by ordinary least
squares, then the sum of the expenditure coefcients will be 1 and the four other column sums in the
preceding model will be zero. 9. Change in adjusted R2. Prove t

42 CHAPTER 4 Finite-Sample Properties of the Least Squares Estimator
TABLE 4.1 Assumptions of the Classical Linear Regression Model A1. Linearity: yi = xi11 +xi22
+KxiK+i. A2. Full rank: The nK sample data matrix, X has full column rank. A3. Exogeneity of