This preview shows pages 1–7. Sign up to view the full content.
Chapter 6 Multiple Regression
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 25
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document 6.7 CI for mean response and PI for new response
Let’s construct a CI for the mean response corresponding to a set
of values
x
h
=
1
x
h
1
x
h
2
.
.
.
x
hk
.
We want to make inferences about
E
(
Y
h
) =
x
0
h
β
=
β
0
+
β
1
x
h
1
+
···
+
β
k
x
hk
.
2 / 25
Some math.
..
A point estimate is
ˆ
Y
h
=
\
E
(
Y
h
) =
x
0
h
b
.
Then
E
(
ˆ
Y
h
) =
E
(
x
0
h
b
) =
x
0
h
E
(
b
) =
x
0
h
β
.
Also var(
ˆ
Y
h
) = cov(
x
0
h
b
) =
x
0
h
cov(
b
)
x
h
=
σ
2
x
0
h
(
X
0
X
)

1
x
h
.
So.
..
A 100(1

α
)% CI for
E
(
Y
h
) is
ˆ
Y
h
±
t
n

p
(1

α/
2)
q
MSE
x
0
h
(
X
0
X
)

1
x
h
,
A 100(1

α
)%
prediction interval
for a new response
Y
h
=
x
0
h
β
+
±
h
is
ˆ
Y
h
±
t
n

p
(1

α/
2)
q
MSE
[1 +
x
0
h
(
X
0
X
)

1
x
h
]
,
3 / 25
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document Dwayne Studios
Say we want to estimate mean sales in cities with
x
1
= 65
.
4
thousand people 16 or younger and per capita disposable income of
x
2
= 17
.
6 thousand dollars. Now say we want a prediction interval
for a
new city
with these covariates. We can add these covariates
to the data step, with a missing value “
.
” for sales, and ask SAS
for the CI and PI.
data studio;
input people16 income sales @@;
label people16=’16 & under (1000s)’ income =’Per cap. disp. income ($1000)’
sales
=’Sales ($1000$)’;
datalines;
68.5
16.7
174.4
45.2
16.8
164.4
91.3
18.2
244.2
47.8
16.3
154.6
46.9
17.3
181.6
66.1
18.2
207.5
49.5
15.9
152.8
52.0
17.2
163.2
48.9
16.6
145.4
38.4
16.0
137.2
87.9
18.3
241.9
72.8
17.1
191.1
88.4
17.4
232.0
42.9
15.8
145.3
52.5
17.8
161.1
85.7
18.4
209.7
41.3
16.5
146.4
51.7
16.3
144.0
89.6
18.1
232.6
82.7
19.1
224.1
52.3
16.0
166.5
65.4
17.6
.
;
proc reg data=studio;
model sales=people16 income / clm cli alpha=0.05;
Output Statistics
Dependent
Predicted
Std Error
Obs
Variable
Value
Mean Predict
95% CL Mean
95% CL Predict
Residual
1
174.4000
187.1841
3.8409
179.1146
195.2536
162.6910
211.6772
12.7841
21
166.5000
157.0644
4.0792
148.4944
165.6344
132.4018
181.7270
9.4356
...et cetera.
..
22
.
191.1039
2.7668
185.2911
196.9168
167.2589
214.9490
.
4 / 25
6.8 Checking model assumptions
The general linear model assumes the following:
1
A linear relationship between
E
(
Y
) and associated predictors
x
1
, . . . ,
x
k
.
2
The errors have constant variance.
3
The errors are normally distributed.
4
The errors are independent.
We estimate the unknown
±
1
, . . . , ±
n
with the residuals
e
1
, . . . ,
e
n
.
Assumptions can be checked informally using plots and formally
using tests.
Note
: We can’t check
E
(
±
i
) = 0 because
e
1
+
···
+
e
n
= 0, i.e.
¯
e
= 0, by construction.
5 / 25
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document Assumption 1: Linear mean
Scatterplots of
{
(
x
i
1
,
Y
i
)
}
n
i
=1
for each predictor
j
= 1
, . . . ,
k
.
This is the end of the preview. Sign up
to
access the rest of the document.
This note was uploaded on 12/14/2011 for the course STAT 704 taught by Professor Staff during the Fall '11 term at South Carolina.
 Fall '11
 Staff
 Statistics

Click to edit the document details