Assignment2solution - Assignment 2 Stat 371 Solution 1. In...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Assignment 2 Stat 371 Solution 1. In the file analytic.txt , you will find overhead costs y for 24 offices (labeled office ) of a large organization for two consecutive years. Data are also given on a number of potential cost drivers with the variate names given in italics: x1 year x2 size (square footage of office space) x3 age (of building) x4 (number of) employees x5 col (cost of living relative to national average) x6 (number of) clients You can download the file from the course web page. The basic objective of this application of PPDAC, as described in Chapter II of the course notes, is to identify offices that have an unexpectedly high or low overhead. a) In the regression model 01 12 23 34 45 56 6 Y xxxxxx R β ββ = + ++++++ , give a careful interpretation of the objective. We are looking for single cases that are very different from what we would expect if the model applied. That is, after adjusting for the explanatory variates, is the estimated residual unusual for each case. b) Suppose we let x 7 be the office number. Explain why it would not make sense to include this explanatory variate in the above model. x7 is a classification or labeling variate – a change from 6 to 7 is not the same as a change from 7 to 8. We could arbitrarily interchange the office numbers without changing the nature of the data. c) Fit the model to the data. Looked at one at a time, which cost drivers contribute significantly to the variation in overhead? The output from R is: lm(formula = overhead ~ year + size + age + employees + col + clients) Residuals: Min 1Q Median 3Q Max -28530.95 -7622.96 82.19 7798.65 27661.10 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -123849.53 53741.13 -2.305 0.026331 * year -8219.51 5642.15 -1.457 0.152787 size 17.01 14.72 1.156 0.254491 age 493.43 373.09 1.323 0.193321 employees 2309.74 2190.33 1.055 0.297820 col 121712.44 50469.01 2.412 0.020444 *
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 clients 56.57 13.36 4.235 0.000126 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 14950 on 41 degrees of freedom Multiple R-Squared: 0.9451, Adjusted R-squared: 0.937 F-statistic: 117.6 on 6 and 41 DF, p-value: < 2.2e-16 The significant drivers (i.e. the explanatory variates for which there is evidence that the betas differ from 0) are the cost of living and number of clients. Note that these tests are based on the assumption that all of the other explanatory variates are included in the model. d) Fit a model with only the drivers identified as significant in c). Is there any evidence that the remaining explanatory variates are important?
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 05/12/2011 for the course STAT 371 taught by Professor Ahmed during the Fall '09 term at Waterloo.

Page1 / 8

Assignment2solution - Assignment 2 Stat 371 Solution 1. In...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online