APPLIED STATISTICS
Model Diagnostics for Linear Regression II
Dr Tao Zou
Research School of Finance, Actuarial Studies & Statistics
The Australian National University
Last Updated: Fri Aug 25 09:13:00 2017
1 / 37
Overview
RSquared and Adjusted RSquared
Graphical Tools for Model Diagnostics
4. Leverage plot.
5. Standardized (Studentized) residuals versus fitted values plot.
6. Cook’s distance plot.
Weighted Regression
2 / 37
References
1.
F.L. Ramsey and D.W. Schafer
(2012)
Chapter 8, Chapter 10 and Chapter 11 of
The Statistical Sleuth
2.
ANU STAT2008 Lecture Notes
3.
W.H. Green
(2012)
Econometric Analysis
4.
The slides are made by
R Markdown
.
3 / 37
Review: Sum of Squared Errors (SSE)
The sum of squared errors (SSE) for a MLR
μ
{
Y

X
}
=
β
0
+
β
1
X
1
+
· · ·
+
β
k
X
k
,
where
X
= (
X
1
,
· · ·
,
X
k
)
,
is defined by
SSE
=
n
i
=
1
res
2
i
=
n
i
=
1
(
Y
i

ˆ
Y
i
)
2
.
In Tutorial 2, we have shown for SLR, the sample variance of the residuals is
s
2
res
=
1
n

1
n
i
=
1
(
res
i

res
)
2
=
1
n

1
n
i
=
1
Y
i

ˆ
Y
i
2
=
1
n

1
SSE
,
which measures
the variation in the residuals
, where
res
=
1
n
n
i
=
1
res
i
=
0
.
This result is also true for MLR. Thus, SSE also measures
the variation in
the residuals.
4 / 37
Total Sum of Squares (SST)
In any dataset, there will be
variation in the values of the response
variable
. This variation can be measured by the sample variance of the
response values
s
2
Y
=
1
n

1
n
i
=
1
(
Y
i

¯
Y
)
2
,
where
¯
Y
=
1
n
n
i
=
1
Y
i
.
We call the total sum of squares (SST) of the response
SST
=
n
i
=
1
(
Y
i

¯
Y
)
2
,
which
the variation in the values of the response variable
too.
5 / 37
Sum of Squares due to Regression (SSR)
In Tutorial 2, we have shown for SLR, the sample variance of the fitted
values is
s
2
ˆ
Y
=
1
n

1
n
i
=
1
ˆ
Y
i

¯
Y
2
.
which measures
the variation in the fitted values
. This result is also true
for MLR. We call the sum of squares due to regression (SSR)
SSR
=
n
i
=
1
ˆ
Y
i

¯
Y
2
,
which
the variation in the fitted values
too.
6 / 37
Partitioning Variability
Intuitively,
∑
n
i
=
1
(
Y
i

¯
Y
)
2
=
∑
n
i
=
1
{
(
Y
i

ˆ
Y
i
) + (
ˆ
Y
i

¯
Y
)
}
2
=
∑
n
i
=
1
(
Y
i

ˆ
Y
i
)
2
+
∑
n
i
=
1
ˆ
Y
i

¯
Y
,
namely
SST
=
SSE
+
SSR
.
One important interpretation of a regression model is that it explains
variation in the values of the response variable
(SST).
The
variation in the values of the response variable
(SST) can be split
into the following two parts:
the variation in the residuals
(SSE)
+
the
variation in the fitted values
(SSR).
The variation in the fitted values
(SSR) is explained by the regression
model.
The variation in the residuals
(SSE) is the unexplained variation.
7 / 37
RSquared
Rsquared for MLR
μ
{
Y

X
}
=
β
0
+
β
1
X
1
+
· · ·
+
β
k
X
k
,
where
X
= (
X
1
,
· · ·
,
X
k
)
,
is the % of the total response variation explained by the regression model
R
2
=
SSR
SST
=
1

SSE
SST
.
0
%
≤
R
2
≤
100
%
.
Hence, if
R
2
is close to 100%, the regression model can explain the
variation in response a lot. The regression model is “good”.
You've reached the end of your free preview.
Want to read all 37 pages?
 Three '14