This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: STATS 203  HW #3 SOLUTION COURTESY TO PATRICK LEAHY 1. R code: > elec.resid = read.table("http://wwwstat.stanford.edu/~nzhang/203_web/ Data/ElectricityConsumption.txt", header=T) > plot(elec.resid, main="Residuals vs X") > abline(a=0, b=0, lty="dashed") 2 4 6 8 1021 1 2 3 Residuals vs X X Residuals As was the case the bacteria data we looked at in class, there is a clear pattern to the distribution of the residuals, which are positive for more extreme values of X and negative for values closer to the median. A nonlinear transformation such as quadratic or logarithmic might alleviate the problem. Also it seems that the residuals have a heteroscedasticity problem. The absolute value of residuals decreases as X increases. If this is the case, a nonlinear transformation might not be helpful. Instead we may use WLS to fix the problem. 2. (RABE 7.4) R code is given below. Note that, as in the book, we omit Alaska from the data. > edu.data = read.table("http://wwwstat.stanford.edu/~nzhang/203_web/ 1 2 COURTESY TO PATRICK LEAHY Data/EducationExpenditure.txt", header=T) > edu.data = edu.data[49,] # remove Alaska > attach(edu.data) > Region = factor(Region) > edu.lm = lm(Y~X1+X2+X3+Region) > summary(edu.lm) Call: lm(formula = Y ~ X1 + X2 + X3 + Region) Residuals: Min 1Q Median 3Q74.539 20.9402.867 18.556 Max 86.766 Coefficients: Estimate Std. Error (Intercept) 168.03880 147.90029 X1 0.04363 0.01413 X2 0.65703 0.36647 X3 0.04806 0.05278 Region24.15441 16.47796 Region312.40588 16.51665 Region4 17.32351 17.50721 t value Pr(>t) (Intercept)1.136 0.26233 X1 3.088 0.00357 ** X2 1.793 0.08020 . X3 0.910 0.36779 Region20.252 0.80218 Region30.751 0.45677 Region4 0.990 0.32808 Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 35.45 on 42 degrees of freedom Multiple Rsquared: 0.5396, Adjusted Rsquared: 0.4738 Fstatistic: 8.204 on 6 and 42 DF, pvalue: 6.709e06 STATS 203  HW #3 SOLUTION 3 The simple linear regression model for these data is Y = 168 . 0388 + 0 . 0436 X 1 + 0 . 6570 X 2 + 0 . 0481 X 3 4 . 1544 I 2 12 . 4059 I 3 + 17 . 3235 I 4 , where I i = 1 if the state is in region i and 0 otherwise. The weightedleastsquares model found in Section 7.4 is Y WLS = 316 . 024 + 0 . 062 X 1 + 0 . 874 X 2 . 029 X 3 . The simple OLS with region indicator ( R 2 = 0 . 5396, ˆ σ = 35 . 45) has a higher R 2 value and a lower residual standard error than WLS ( R 2 = 0 . 477, ˆ σ = 36 . 52), so with respect to these indicators it fits the data better. We can use a nested Ftest to test the hypothesis H : I 2 = I 3 = I 4 = 0 against H a : the regressions vary by region: > anova(edu.lm, lm(Y~X1+X2+X3)) Analysis of Variance Table Model 1: Y ~ X1 + X2 + X3 + Region Model 2: Y ~ X1 + X2 + X3 Res.Df RSS Df Sum of Sq F 1 42 52782 2 45 57700 34918 1.3045 Pr(>F) 1 2 0.2856 The test produces a Fstatistic of 1.3045 and a corresponding pvalue of 0.2856, which is not large enough to reject the null hypothesis at a significance level of even...
View
Full
Document
This note was uploaded on 04/25/2010 for the course MATH 158 taught by Professor Karaali during the Spring '09 term at Pomona College.
 Spring '09
 KARAALI
 Calculus

Click to edit the document details