12.1 Simple Linear Regression Returns!
Def’n: The regression line
predicts the value for the response variable
y
as a straightline
function of the value
x
of the explanatory variable.
Equation for the regression line:
ŷ
=
a
+
bx

a
is the intercept: the height of the line at
x
= 0.

b
is the slope: the amount by which
y
increases when
x
increases by 1 unit.

ŷ
(“yhat”) denotes the predicted value of
y
(or mean
y
for a given value of
x
).
Simple Linear Regression (SLR) model
:
Now, we introduce the population regression equation
:
y
i
=
α
+
β
x
i
+
ε
i
μ
(
y
i

x
i
) =
α
+
β
x
i
,
i
= 1, …,
n
•
(
x
1
,
y
1
), (
x
2
,
y
2
), …, (
x
n
,
y
n
) are the observed data.
•
ε
1
, …,
ε
n
are unobserved “errors”, assumed to be a random sample from
N
(0,
σ
).
•
α
,
β
,
σ
are unknown parameters.
o
α
is the “population” intercept.
o
β
is the
average
change in
y
associated with a 1unit increase in
x
.
o
σ
determines the extent to which points deviate from the line
•
y
1
, …,
y
n
are random variables;
α
,
β
, and
x
i
are all fixed.
o
The conditional distribution of
y
i
given
x
i
is
N
(
α
+
β
x
i
,
σ
).
•
Basic Assumptions of the SLR Model
o
The distribution of
ε
at any particular
x
has a mean of zero (
μ
ε
= 0).
o
The std. dev. of
ε
is the same for any particular
x
(i.e. it’s constant).
o
The distribution of
ε
at any particular
x
is normal.
o
The random deviations
ε
1
,
ε
2
, …,
ε
n
associated with different observations
are independent of one another.
•
The equation merely
approximates
the relation, it is a model
.
Least squares estimation of
α
and
β
:
Def’n: A residual
is the difference between an observed value and its estimated value.
Since
ŷ
denotes the estimated value of
y
, then at some observed value of
x
, say
x
i
, the
residual is defined as
y
i
–
ŷ
i
=
y
i
– (
a
+
bx
i
)
The residual represents the vertical deviation of the point from the line.
We want to
choose (
a
,
b
) to minimize the sum of squared deviations (hence “least squares”):
2
)
(
∑