2010_HW3_sol - STATS 203 - HW #3 SOLUTION COURTESY TO...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: STATS 203 - HW #3 SOLUTION COURTESY TO PATRICK LEAHY 1. R code: > elec.resid = read.table("http://www-stat.stanford.edu/~nzhang/203_web/ Data/ElectricityConsumption.txt", header=T) > plot(elec.resid, main="Residuals vs X") > abline(a=0, b=0, lty="dashed") 2 4 6 8 10-2-1 1 2 3 Residuals vs X X Residuals As was the case the bacteria data we looked at in class, there is a clear pattern to the distribution of the residuals, which are positive for more extreme values of X and negative for values closer to the median. A nonlinear transformation such as quadratic or logarithmic might alleviate the problem. Also it seems that the residuals have a heteroscedasticity problem. The absolute value of residuals decreases as X increases. If this is the case, a nonlinear transformation might not be helpful. Instead we may use WLS to fix the problem. 2. (RABE 7.4) R code is given below. Note that, as in the book, we omit Alaska from the data. > edu.data = read.table("http://www-stat.stanford.edu/~nzhang/203_web/ 1 2 COURTESY TO PATRICK LEAHY Data/EducationExpenditure.txt", header=T) > edu.data = edu.data[-49,] # remove Alaska > attach(edu.data) > Region = factor(Region) > edu.lm = lm(Y~X1+X2+X3+Region) > summary(edu.lm) Call: lm(formula = Y ~ X1 + X2 + X3 + Region) Residuals: Min 1Q Median 3Q-74.539 -20.940-2.867 18.556 Max 86.766 Coefficients: Estimate Std. Error (Intercept) -168.03880 147.90029 X1 0.04363 0.01413 X2 0.65703 0.36647 X3 0.04806 0.05278 Region2-4.15441 16.47796 Region3-12.40588 16.51665 Region4 17.32351 17.50721 t value Pr(>|t|) (Intercept)-1.136 0.26233 X1 3.088 0.00357 ** X2 1.793 0.08020 . X3 0.910 0.36779 Region2-0.252 0.80218 Region3-0.751 0.45677 Region4 0.990 0.32808--- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 35.45 on 42 degrees of freedom Multiple R-squared: 0.5396, Adjusted R-squared: 0.4738 F-statistic: 8.204 on 6 and 42 DF, p-value: 6.709e-06 STATS 203 - HW #3 SOLUTION 3 The simple linear regression model for these data is Y =- 168 . 0388 + 0 . 0436 X 1 + 0 . 6570 X 2 + 0 . 0481 X 3- 4 . 1544 I 2- 12 . 4059 I 3 + 17 . 3235 I 4 , where I i = 1 if the state is in region i and 0 otherwise. The weighted-least-squares model found in Section 7.4 is Y WLS =- 316 . 024 + 0 . 062 X 1 + 0 . 874 X 2- . 029 X 3 . The simple OLS with region indicator ( R 2 = 0 . 5396, ˆ σ = 35 . 45) has a higher R 2 value and a lower residual standard error than WLS ( R 2 = 0 . 477, ˆ σ = 36 . 52), so with respect to these indicators it fits the data better. We can use a nested F-test to test the hypothesis H : I 2 = I 3 = I 4 = 0 against H a : the regressions vary by region: > anova(edu.lm, lm(Y~X1+X2+X3)) Analysis of Variance Table Model 1: Y ~ X1 + X2 + X3 + Region Model 2: Y ~ X1 + X2 + X3 Res.Df RSS Df Sum of Sq F 1 42 52782 2 45 57700 -3-4918 1.3045 Pr(>F) 1 2 0.2856 The test produces a F-statistic of 1.3045 and a corresponding p-value of 0.2856, which is not large enough to reject the null hypothesis at a significance level of even...
View Full Document

This note was uploaded on 04/25/2010 for the course MATH 158 taught by Professor Karaali during the Spring '09 term at Pomona College.

Page1 / 15

2010_HW3_sol - STATS 203 - HW #3 SOLUTION COURTESY TO...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online