18
To summarize:
R’s
lm(Y
∼
X)
function
I
finds the coefficients
b
0
and
b
1
characterizing the the
“least squares” line
ˆ
Y
=
b
0
+
b
1
X
.
I
That is it minimizes
∑
n
i
=1
(
Y
i

ˆ
Y
i
)
2
=
∑
n
i
=1
e
2
i
.
The least squares formulas are
b
1
=
r
xy
s
y
s
x
and
b
0
=
¯
Y

b
1
¯
X
.
19
Exercise (on your own):
Take the partial derivatives of the sum of squares of the
residuals w.r.t.
β
0
and
β
1
, set these equal to zero, and solve
the series of equations to obtain the least squares estimates.
20
Properties of the least squares fit
Developing techniques for model validation and criticism
requires a deeper understanding of the least squares line.
The
fitted values
(
ˆ
Y
i
) and
“residuals”
(
e
i
)
obtained from the
least squares
line have some special properties.
I
From now on
“obtained from the least squares line”
will
be implied (and therefore not repeated) whenever we talk
about
ˆ
Y
i
and
e
i
.
Lets look at the housing data analysis to figure out what some
of these properties are ...
21
The fitted values are perfectly correlated with the inputs.
> plot(size, reg$fitted, pch=20, xlab="X",
+
ylab="Fitted Values")
> text(x=3, y=80, col=2, cex=1.5,
+
paste("corr(y.hat, x) =", cor(size, reg$fitted)))
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1.0
1.5
2.0
2.5
3.0
3.5
80
100
120
140
160
X
Fitted Values
corr(y.hat, x) = 1
22
The residuals are “stripped of all linearity”.
> plot(size, reg$fittedprice, pch=20, xlab="X", ylab="Residuals")
> text(x=3.1, y=26, col=2, cex=1.5,
+
paste("corr(e, x) =", round(cor(size, reg$fittedprice),2)))
> text(x=3.1, y=19, col=4, cex=1.5,
+
paste("mean(e) =", round(mean(reg$fittedprice),0)))
> abline(h=0, col=8, lty=2)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1.0
1.5
2.0
2.5
3.0
3.5
20
10
0
10
20
30
X
Residuals
corr(e, x) = 0
mean(e) = 0
23
What is the intuition for the relationship between
ˆ
Y
,
e
, and
X
?
I
Lets consider some
“crazy”
alternative line:
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1.0
1.5
2.0
2.5
3.0
3.5
60
80
100
120
140
160
X
Y
LS line: 38.9 + 35.4 X
Crazy line: 10 + 50 X
24
This is a bad fit!
We are underestimating the value of small
houses and overestimating the value of big houses.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1.0
1.5
2.0
2.5
3.0
3.5
20
10
0
10
20
30
X
Crazy Residuals
corr(e, x) = 0.7
mean(e) = 1.8
I
Clearly, we have left some predictive ability on the table!
25
As long as the correlation between
e
and
X
is nonzero, we
could always adjust our prediction rule to do better.
We need to exploit all of the predictive power in the
X
values
and put this into
ˆ
Y
,
I
leaving no “
Xness
” in the residuals.
In Summary:
Y
=
ˆ
Y
+
e
where:
I
ˆ
Y
is “made from
X
”;
corr(
X
,
ˆ
Y
) = 1
;
I
e
is unrelated to
X
;
corr(
X
,
e
) = 0
.
You've reached the end of your free preview.
Want to read all 40 pages?
 Spring '14