Unformatted text preview: STAT5044: Regression and ANOVA The Solution of Homework #4 Inyoung Kim Problem# 1. (a) Call: lm(formula = Y ~ X1 + X2 + X3, data = data2) Residuals: Min 1Q Median 3Q Max264.05 110.7322.52 79.29 295.75 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 4.150e+03 1.956e+02 21.220 < 2e16 *** X1 7.871e04 3.646e04 2.159 0.0359 * X21.317e+01 2.309e+010.570 0.5712 X3 6.236e+02 6.264e+01 9.954 2.94e13 *** Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 143.3 on 48 degrees of freedom Multiple RSquared: 0.6883, Adjusted Rsquared: 0.6689 Fstatistic: 35.34 on 3 and 48 DF, pvalue: 3.316e12 b 1 and b 2 give information about a linear relationship between X 1 and Y and between X 2 and Y, respectively. But, b 3 does NOT give any information about a linear relationship between X 3 and Y because X 3 is indicator variable. Since there is an evidence that b 3 is significantly different from 0, this result implies that the means of y values between two groups ( X 3 = 0 vs X 3 = 1) are significantly different. (b) Boxplot of resulds show that the distribution of residual is almost symmetric without outliears. (c) In scatter plot of fitted values, there are two different groups; one is around fitted values 4300 and the other is around 4900. In residual plot with X 1 , the spead of residuals is getting narrower as X 1 increases. In residual plot with X 3 , there is two different groups because X 3 is indicator variable. In residual plot with X 1 X 2 , it seems that there is a polynomial pattern between two. Normal QQ plot suggests that the distibution of residual has a heavy tail compared with normal distribution (d) In this plot, there is no special pattern which means constant variance. (e) The result of BrownForsythe test with significan level = 0 . 01 suggests that the residuals between two groups are not different which implies constant variance of residuals. The following is the result of BrownForsythe test. Welch Two Sample ttest data: d1 and d2 1200100 100 200 300 Figure 1: boxplot for problem 2.2 4400 4600 4800 5000200100 100 200 300 fited(multilm) resid 250000 300000 350000 400000 450000200100 100 200 300 X1 resid 5 6 7 8 9200100 100 200 300 X2 resid 0.0 0.2 0.4 0.6 0.8 1.0200100 100 200 300 X3 resid 2 4 6 8 10200100 100 200 300 X23 resid21 1 2200100 100 200 300 Normal QQ Plot Theoretical Quantiles Sample Quantiles Figure 2: Scatter plots and normal QQplot boxplot for problem 2.3 2 10 20 30 40 50200100 100 200 300 c(1:52) resid Figure 3: time plot for problem 2.4 t = 1.2698, df = 48.775, pvalue = 0.2102 alternative hypothesis: true difference in means is not equal to 0 99 percent confidence interval:90.57614 32.34565 sample estimates: mean of x mean of y 95.6889 124.8042 (f) The F test result suggests that there is a statistical evidence of rejecting H : 1 = 2 = 3 = 0 at the significant level = 0 . 05 (pvalue=3.316e12). Based on t test of summary from muliple linear regression,...
