1
Assignment 2 Stat 371 Solution
1.
In the file
analytic.txt
, you will find
overhead
costs
y
for 24 offices (labeled
office
) of
a large organization for two consecutive years. Data are also given on a number of
potential cost drivers with the variate names given in italics:
x1
year
x2
size
(square footage of office space)
x3
age
(of building)
x4 (number of)
employees
x5
col
(cost of living relative to national average)
x6 (number of)
clients
You can download the file from the course web page. The basic objective of this
application of PPDAC, as described in Chapter II of the course notes,
is to identify
offices that have an unexpectedly high or low overhead.
a)
In the regression model
0
1
1
2
2
3
3
4
4
5
5
6
6
Y
x
x
x
x
x
x
R
β
β
β
β
β
β
β
=
+
+
+
+
+
+
+
, give a
careful interpretation of the objective.
We are looking for single cases that are very different from what we would expect if
the model applied. That is, after adjusting for the explanatory variates, is the
estimated residual unusual for each case.
b)
Suppose we let
x
7
be the office number. Explain why it would not make sense to
include this explanatory variate in the above model.
x7 is a classification or labeling variate – a change from 6 to 7 is not the same as a
change from 7 to 8. We could arbitrarily interchange the office numbers without
changing the nature of the data.
c)
Fit the model to the data. Looked at one at a time, which cost drivers contribute
significantly to the variation in overhead?
The output from R is:
lm(formula = overhead ~ year + size + age + employees + col + clients)
Residuals:
Min
1Q
Median
3Q
Max
-28530.95
-7622.96
82.19
7798.65
27661.10
Coefficients:
Estimate
Std. Error
t value
Pr(>|t|)
(Intercept)
-123849.53
53741.13
-2.305
0.026331 *
year
-8219.51
5642.15
-1.457
0.152787
size
17.01
14.72
1.156
0.254491
age
493.43
373.09
1.323
0.193321
employees
2309.74
2190.33
1.055
0.297820
col
121712.44
50469.01
2.412
0.020444 *
This
preview
has intentionally blurred sections.
Sign up to view the full version.