This preview shows pages 1–4. Sign up to view the full content.
1
Stat 371 Assignment 3 Solution
1.
The purpose of this question is to review some of the tools used to assess the fit of a
model and to look for outliers in the data set. To do so we use an artificial example
with 50 observations on a response variate
y
and two explanatory variates
x
1
and
x
2
.
The data are stored in the file
ass3newq1.txt
available on the course web page.
The goal of the investigation is to predict
y
when
x
x
12
15
15
=
=
,
. Start by fitting the
model
y
x
x
r
=+
+
+
β
01
2
.
a)
Use plots of the estimated residuals and qq plots of the standardized residuals
to
determine if other terms (e.g. squares and products) or a transformation is needed.
To create a nice format for your plots, you might like to use the R code
par(mfrow=c(n,m)) where you select integers n and m. This creates an n x m array for
the next nm plots you create.
Below are plots of the estimated residuals versus the fitted values, the qq plot of the
estimated residuals, the leverages and the studentized residuals from fitting the model
y
x
x
r
+
+
2
.
The qq plot suggests that there are too many large and small estimated residuals for
the sample to be from a normal distribution. We also see some very large and small
studentized residuals. We need a different model.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document 2
After messing about for a while, I fit the model with added terms for
xx
1
2
2
2
,
and
x x
12
.
The four plots described above are
With the exception of one point with large
h
ii
and one (or two) large studentized
residuals the plots suggest a reasonable fit. The summary for the model is
Call:
lm(formula = y ~ x1 + x2 + x11 + x22 + x12)
Residuals:
Min
1Q
Median
3Q
Max
3.5698 0.9150 0.2668
1.1260
3.4817
Coefficients:
Estimate
Std. Error t value
Pr(>t)
(Intercept) 19.23044
3.09586
6.212
1.65e07 ***
x1
0.86260
0.36600
2.357
0.02295 *
x2
1.97900
0.34593
5.721
8.67e07 ***
x11
0.03154
0.01170
2.695
0.00992 **
x22
0.01683
0.01182
1.424
0.16141
x12
0.24473
0.01469
16.664
< 2e16 ***

Signif. codes:
0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 1.58 on 44 degrees of freedom
Multiple RSquared: 0.9912,
Adjusted Rsquared: 0.9902
3
Fstatistic: 990.3 on 5 and 44 DF,
pvalue: < 2.2e16
b)
Decide on a final form for the model.
Based on the above summary and plots, I decided to drop
x
2
2
from the mode. The
resulting summary and plots suggest that
yx
x
x
x
x
r
=
+
+
+
+
+
β
01
12
23
1
2
412
is a reasonable model.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview. Sign up
to
access the rest of the document.
This note was uploaded on 05/12/2011 for the course STAT 371 taught by Professor Ahmed during the Fall '09 term at Waterloo.
 Fall '09
 AHMED

Click to edit the document details