Introduction to Econometrics
6-1
Technical Notes
Introduction to Econometrics
Lecture 6: Multiple Regression - Goodness of Fit
Measuring the Goodness of Fit - R
2
For a multiple regression measures of goodness of fit based on the correlation coefficient between any
individual
regressor Z and Y
Corr(Z,Y) = Cov(Z,Y)/
√
(Var(Z)Var(Y))
are not very useful. However we can still decompose Var(Y) into explained and unexplained components
1
.
In the two variable case
Y =
α
e
+
β
e
X +
θ
e
Z + U
e
Var(Y) = Var(
α
e
+
β
e
X +
θ
e
Z + U
e
)
= (
β
e
)
2
Var(X) + 2
β
e
θ
e
Cov(X,Z) + 2
β
e
Cov(X,U
e
) + (
θ
e
)
2
Var(Z) + 2
θ
e
Cov(Z,U
e
) + Var(U
e
)
since
α
e
is a constant. In addition, Cov(X,U
e
) and Cov(Z,U
e
) are both zero (this was shown in Lecture 5), so
this expression reduces to
Var(Y) = {(
β
e
)
2
Var(X) + 2
β
e
θ
e
Cov(X,Z) + (
θ
e
)
2
Var(Z)} + Var(U
e
)
The first three terms (in {}) are the
Explained
Sum of Squares SSE, the last the
Residual
Sum of Squares
SSR (each divided by N-1)
2
. So
R
2
= (Var(Y) - Var(U
e
))/Var(Y)
= (SST - SSR)/SST = 1 - SSR/SST
This decomposition is exactly the same as in the case of simple regression: similarly the
AN
alysis
O
f
VA
riance table (set out in Lecture 4, repeated below) is equally useful for a multivariate regression.
The value of R
2
depends on the precise form of Y (since Var(Y) is the variance around
mean
(Y)), so there
is a vital rule to remember
Never
use R
2
to compare the fit of regressions with different dependent variables
In addition the statistical significance associated with the value of R
2
is a function of the number of
observations and the number of explanatory variables, so a second rule is
Do not use R
2
(even informally) to compare regressions with a different number of observations
R
2
always increases if more regressors are added, so if you want to compare two regressions with the same
dependent variable you should use adjusted
R
2
(R-bar squared) instead. Adjusted R
2
is defined as
adjusted R
2
= R
2
- {(K-1)/(N-K)}(1 - R
2
)
so it can either increase or decrease if more regressors are added.
It’s often better to compare two regressions using the square root of the estimated residual variance
s =
√
(
Σ
(u
e
i
)
2
/(N-K))
1
The logic behind this decomposition for simple regression was discussed in Lecture 4.
2
Remember the warning (in Lecture 4) about the possible alternative meanings for the abbreviations SSE and SSR.

This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*