1Assignment 2 Stat 371 Solution 1.In the file analytic.txt, you will find overheadcosts y for 24 offices (labeled office) of a large organization for two consecutive years. Data are also given on a number of potential cost drivers with the variate names given in italics: x1 yearx2 size(square footage of office space) x3 age(of building) x4 (number of) employeesx5 col (cost of living relative to national average) x6 (number of) clientsYou can download the file from the course web page. The basic objective of this application of PPDAC, as described in Chapter II of the course notes, is to identify offices that have an unexpectedly high or low overhead. a)In the regression model 0112233445566YxxxxxxRβββββββ=+++++++, give a careful interpretation of the objective. We are looking for single cases that are very different from what we would expect if the model applied. That is, after adjusting for the explanatory variates, is the estimated residual unusual for each case. b)Suppose we let x7be the office number. Explain why it would not make sense to include this explanatory variate in the above model. x7 is a classification or labeling variate – a change from 6 to 7 is not the same as a change from 7 to 8. We could arbitrarily interchange the office numbers without changing the nature of the data. c)Fit the model to the data. Looked at one at a time, which cost drivers contribute significantly to the variation in overhead? The output from R is: lm(formula = overhead ~ year + size + age + employees + col + clients) Residuals: Min 1Q Median 3Q Max -28530.95 -7622.96 82.19 7798.65 27661.10 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -123849.53 53741.13 -2.305 0.026331 * year -8219.51 5642.15 -1.457 0.152787 size 17.01 14.72 1.156 0.254491 age 493.43 373.09 1.323 0.193321 employees 2309.74 2190.33 1.055 0.297820 col 121712.44 50469.01 2.412 0.020444 *
has intentionally blurred sections.
Sign up to view the full version.