STA 302 H1F / 1001 HF – Fall 2009
Test
October 22, 2009
LAST NAME:
SOLUTIONS
FIRST NAME:
STUDENT NUMBER:
ENROLLED IN: (circle one)
STA 302
STA 1001
INSTRUCTIONS:
•
Time: 90 minutes
•
Aids allowed: calculator.
•
A table of values from the
t
distribution is on the last page (page 10).
•
Total points: 50
Some formulae:
b
1
=
∑
(
X
i

X
)(
Y
i

Y
)
∑
(
X
i

X
)
2
=
∑
X
i
Y
i

n
X
Y
∑
X
2
i

n
X
2
b
0
=
Y

b
1
X
Var(
b
1
) =
σ
2
∑
(
X
i

X
)
2
Var(
b
0
) =
σ
2
1
n
+
X
2
∑
(
X
i

X
)
2
Cov(
b
0
, b
1
) =

σ
2
X
∑
(
X
i

X
)
2
SSTO =
∑
(
Y
i

Y
)
2
SSE =
∑
(
Y
i

ˆ
Y
i
)
2
SSR =
b
2
1
∑
(
X
i

X
)
2
=
∑
(
ˆ
Y
i

Y
)
2
σ
2
{
ˆ
Y
h
}
= Var(
ˆ
Y
h
) =
σ
2
1
n
+
(
X
h

X
)
2
∑
(
X
i

X
)
2
σ
2
{
pred
}
= Var(
Y
h

ˆ
Y
h
) =
σ
2
1 +
1
n
+
(
X
h

X
)
2
∑
(
X
i

X
)
2
r
=
∑
(
X
i

X
)(
Y
i

Y
)
p
∑
(
X
i

X
)
2
∑
(
Y
i

Y
)
2
S
XX
=
∑
(
X
i

X
)
2
=
∑
X
2
i

n
X
2
1abc
1de
2a
2bcd
2efgh
3ab
3c
3de
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
1. The method of least squares is used to fit a simple linear regression model
Y
=
β
0
+
β
1
X
+
to
n
observations (
X
i
, Y
i
).
The values of the
X
i
’s are not realizations of random variables
but are fixed in advance by the researcher.
Assume the following: the form of the model
is appropriate, the GaussMarkov conditions hold, and the distribution of the error terms is
Normal. In your answers, you may use any of the formulae on the front page.
(a) (4 marks) Which of the assumptions are needed to fit the model using least squares?
How would you assess the necessary assumptions?
For least squares, the only assumption needed is the form of the model.
To check this look at plots of
Y
versus
X
and the residuals versus either
X
or the
predicted values. Look for outliers / influential points and curvature.
(b) (3 marks) What is E(
Y
)?
Which of the assumptions did you use to determine your
answer?
E
(
Y
) =
β
0
+
β
1
X
Assumptions used: form of model and E
( ) = 0
(c) (2 marks) Suppose the researcher is interested in the relationship between
X
and
Y
on a certain range of
X
’s. She uses the smallest value in the range of
X
for half of the
observations and the largest value in the range of
X
for the other half of the observations
and fits the simple linear regression model to the resulting data. What is the advantage
of fixing the
X
’s to be these values? What is the disadvantage?
Advantage:
This choice of
X
’s will make
S
XX
as large as possible, giving more pre
cise estimates (smaller variance) of the model parameters.
Disadvantage: Won’t be able to determine if the form of the relationship really is linear
without observations on more values of
X
.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '09
 Regression Analysis, Montreal, Sxx, xi −x, Variable Intercept time

Click to edit the document details