September 9, 2010
The assumptions of linear regression
Introduction
Multiple regression is used to estimate relationships between a response (dependent) variable and one or more
explanatory (independent) variables. The multiple regression model with k predictor variables
,
)
2
,
0
(
NID
~
i
ki
X
k
.....
i
2
X
2
i
1
X
1
0
i
y
σ
ε
ε
+
β
+
+
β
+
β
+
β
=
is summarized in matrix notation as
ε
+
β
=
X
y
ε ~ NID (0 σ
2
)
where
y
is an (n x 1) vector for the response variable, X is an (n x (k + 1)) matrix of k predictor variables and a
column of 1’s for the intercept,
β
is a ((k +1) x 1) vector of unknown population parameters, and
ε
is an (n x 1)
vector of population errors, or departures of data points from the prediction equation. The matrices are as follows:
y =
n
y
y
y
2
1
X =
kn
n
n
k
k
X
X
X
X
X
X
X
X
X
2
1
2
22
12
1
21
11
1
1
1
β =
k
β
β
β
1
0
ε =
k
ε
ε
ε
2
1
.
If we were to regress the y – vector against the X’s for the whole population, we would obtain the population
prediction equation
ki
X
k
.....
i
2
X
2
i
1
X
1
0
i
y
ˆ
β
+
+
β
+
β
+
β
=
where
k
,
2
,
1
,
0
β
β
β
β
are all population
parameters
. The parameter β
1
in the context of the equation
ki
X
k
.....
i
2
X
2
i
1
X
1
0
i
y
ˆ
β
+
+
β
+
β
+
β
=
is shorthand notation for the relationship between y and X
1
, controlling
for X
2
,. . . .X
k
.
In brief:
k
X
X
X
yX
......
,
1
1
3
2

β
β
=
K
y
.
,..
3
,
2
1

β
=
The vector of errors, ε, is the difference between the observed value of
i
y
and the value of y predicted by the
equation (
i
y
ˆ
); that is,
i
y
ˆ
i
y
i

=
ε
.
One characteristic of population parameters is that the values
k
,
2
,
1
,
0
β
β
β
β
are constant for a specific
population (i.e., a closed block of people, communities or organizations at a fixed point in time).
This is not true
of samples. If we select a sample from the population, the resulting regression prediction equation is of the form:
ki
X
k
b
.....
i
2
X
2
b
i
1
X
1
b
0
b
i
y
ˆ
+
+
+
+
=
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
or equivalently in matrix notation,
Xb
y
ˆ
=
where
y
ˆ
is an (n x 1) vector of predicted values of
y
and the vector of errors is
)
i
y
ˆ
i
y
(
i
e

=
.
But this time, because we have a sample from the population rather than the whole population, the elements of the
vector
]
[
2
1
0
k
T
b
b
b
b
b
=
are
statistics
used to estimate the parameters
]
,
,
,
[
2
1
0
k
T
β
β
β
β
β
=
.
Similarly,
)
i
y
ˆ
i
y
(
i
e

=
is a vector of errors based on the sample statistics
k
b
b
b
b
,
,
,
2
1
0
rather than the
population parameters
k
,
2
,
1
,
0
β
β
β
β
. The relationship between the statistics (
k
b
2
b
1
b
0
b
) and the
parameters (
k
,
2
,
1
,
0
β
β
β
β
) is important. If we returned to the population and selected a second sample, we
would get a second, different prediction equation
ki
k
i
i
i
X
b
X
b
X
b
b
y
'
.....
'
'
'
'
ˆ
2
2
1
1
0
+
+
+
+
=
with a second, different vector of errors,
)
ˆ
(
`
i
i
i
y
y
e

=
. If we drew a 3
rd
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 Staff
 Linear Regression, Regression Analysis, OLS, assumption, OLS regression

Click to edit the document details