ch06 - CHAPTER 6 Note to Instructor: variables. Sections...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CHAPTER 6 Note to Instructor: variables. Sections 6-2 6-1. a) The regression equation is For computer exercises, the procedure ‘Regression’ under ‘Stat’ in Minitab can be used for the regression analysis except for computing confidence intervals on the regressor Thermal = 0.0249 + 0.129 Density Predictor Coef StDev T P Constant 0.024934 0.001786 13.96 0.000 Density 0.128522 0.007738 16.61 0.000 S = 0.0005852 R-Sq = 98.6% R-Sq(adj) = 98.2% Analysis of Variance Source DF SS MS Regression 1 0.000094464 0.000094464 Residual Error 4 0.000001370 0.000000342 Total 5 0.000095833 ˆ y = 0.0249 + 0.129 x b) 0.0005747 -0.0007088 0.0001486 -0.0004799 -0.0000644 0.0005298 c) SSE = 0.000001370 σ 2 = 0.000000342 ˆ F 275.84 P 0.000 ˆ ˆ d) se( β1 ) = 0.007738, se( β 0 ) = 0.001786 e) SST = 0.000095833 SSR = 0.000094464, SSE = 0.000001370, and SSR + SSE = 0.000095834 ∴ SST = SSR + SSE f) R2 = 98.6%. This is interpreted as 98.6% of the total variability in thermal conductivity can be explained by the fitted regression model. g) See the Minitab output given in part a. Based on the t-tests, we conclude that the slope and intercept are nonzero. h) See the Minitab output in part a. Based on the analysis of variance, we can reject the null hypothesis and conclude that the regression is significant. i) β0: 0.024934 ± 2.776(0.001786); 0.02, 0.03 β1: 0.128522 ± 2.776(0.007738); 0.107, 0.15 j) Residual Plots 1 Residuals Versus Density (respons e is Conduc t) 0 .0 0 0 5 R e s id u a l 0 .0 0 0 0 - 0 .0 0 0 5 0 .1 8 0. 2 3 0 .2 8 Density Residuals Versus the Fitted Values (respons e is Conduc t) 0 .0 0 0 5 R e s id u a l 0 .0 0 0 0 - 0 .0 0 0 5 0 . 0 50 0 .0 5 5 0.060 Fitted Value 2 No rmal Pro bability Plo t of the Residuals (respons e is Conduc t) 1 Normal Score 0 -1 - 0 .0 0 0 5 0. 0 0 0 0 0 . 0 00 5 Residual k) r = 0.993, P-value = 0, Therefore, we conclude there is a significant correlation between density and conductivity. 6-2. a) The regression equation is Usage = - 6.34 + 9.21 Temp Predictor Constant Temp S = 1.943 Coef -6.336 9.20836 StDev 1.668 0.03377 T -3.80 272.64 P 0.003 0.000 R-Sq = 100.0% R-Sq(adj) = 100.0% Analysis of Variance Source Regression Residual Error Total ˆ y = −6.34 + 9.21x b) -1.25010 -0.19519 -0.30208 -1.61751 0.49740 2.07214 1.71688 -0.02329 -2.55294 -1.15260 -1.25734 4.06464 DF 1 10 11 SS 280583 38 280621 MS 280583 4 F 74334.36 P 0.000 3 c) SSE = 38 σ 2 = 4 ˆ ˆ ) = 1.668, se( β ) = 0.03377 ˆ d) se( β1 0 e) SST = 280621 SSR = 280583, SSE = 38, and SSR + SSE = 280621 ∴ SST = SSR + SSE f) R2 = 100%. This is interpreted as 100% of the total variability in Usage can be explained by the fitted regression model. g) See the Minitab output given in part a. Based on the t-tests, we conclude that the slope and intercept are nonzero. h) See the Minitab output in part a. Based on the analysis of variance, we can reject the null hypothesis and conclude that the regression is significant. i) β0: -6.34 ± 2.228(1.668); 2.62,10.06 β1: 9.21 ± 2.228(0.03377); 9.13,9.29 j) Residual Plots. Residuals Versus T em p (response is Usage) 4 3 2 Resid ual 1 0 -1 -2 -3 20 30 40 50 60 70 80 Temp Residuals Versus the Fitted Values (response is Usage) 4 3 2 Residu al 1 0 -1 -2 -3 200 300 400 500 600 700 Fitted Valu e 4 Normal Probability Plot of the Residuals (response is Usage) 2 1 Normal Score 0 -1 -2 -3 -2 -1 0 1 2 3 4 Resid ual k) r = 1.000, P-value = 0, Therefore, we conclude there is a significant correlation between temperature and usage. 6-3. a) The regression equation is Deflect = 0.393 + 0.00333 Temp Predictor Constant Temp S = 0.006473 Coef 0.39346 0.0033285 SE Coef 0.04258 0.0005815 T 9.24 5.72 P 0.000 0.000 R-Sq = 64.5% SS 0.0013727 0.0007542 0.0021270 R-Sq(adj) = 62.6% MS 0.0013727 0.0000419 F 32.76 P 0.000 Analysis of Variance Source DF Regression 1 Residual Error 18 Total 19 ˆ y = 0.393 + 0.00333x b) -0.0054488 0.0072519 0.0065614 -0.0127685 0.0069249 -0.0004269 -0.0027627 -0.0044372 0.0002431 -0.0074196 0.0015643 0.0078738 0.0035833 -0.0077656 -0.0011131 -0.0024386 0.0105570 -0.0054342 -0.0014357 0.0068913 c) SSE = 0.0007542 ˆ σ 2 = 0.0000419 5 ˆ ˆ d) se( β1 ) = 0.0005815 , se( β 0 ) = 0.04258 e) SST = 0.0020402 SSR = 0.0013149, SSE = 0.0007253, and SSR + SSE = 0.0020402 ∴ SST = SSR + SSE f) R2 = 64.5%. This is interpreted as 64.5% of the total variability in deflection can be explained by the fitted regression model. g) See the Minitab output given in part a. Based on the t-tests, we conclude that the slope and intercept are nonzero. h) See the Minitab output in part a. Based on the analysis of variance, we can reject the null hypothesis and conclude that the regression is significant. i) β0: 0.39346 ± 2.101(0.04258); 0.304, 0.483 β1: 0.0033285 ± 2.101(0.0005815 ); 0.00211, 0.00455 j) Residual Plots Residuals Versus Temp (respons e is Deflec t) 0 .0 1 R e s id u a l 0 .0 0 - 0 .0 1 68 73 78 T emp 6 Residuals Versus the Fitted Values (respons e is Deflec t) 0 .0 1 R e s id u a l 0 .0 0 - 0 .0 1 0 .6 2 0 .6 3 0 .6 4 0 .6 5 Fitted Value No rmal Pro bability Plo t of the Residuals (respons e is Deflec t) 2 1 Norma l Score 0 -1 -2 - 0 .0 1 0. 0 0 0. 0 1 Residual k) r = 0.803, P-value = 0, Therefore, we conclude there is a significant correlation between temperature and deflection. 6-4. a) The regression equation is Turbidity = - 511 + 26.3 Temperature Predictor Constant Coef -510.7 StDev 228.2 T -2.24 P 0.045 7 Temperat S = 67.68 26.308 R-Sq = 40.6% 9.178 2.87 0.014 R-Sq(adj) = 35.7% Analysis of Variance Source Regression Residual Error Total DF 1 12 13 SS 37636 54963 92599 MS 37636 4580 F 8.22 P 0.014 ˆ y = −511 + 26.3x b) 33.253 -2.686 11.253 10.622 -2.607 -88.565 -69.041 -75.934 -91.980 -34.116 67.389 110.066 56.435 75.912 c) SSE = 54963 σ 2 = 4580 ˆ ˆ ˆ d) se( β ) = 228.2, se( β ) = 9.178 1 0 e) SST = 92599 SSR = 37636, SSE = 54963, and SSR + SSE = 92599 ∴ SST = SSR + SSE f) R2 = 40.6%. This is interpreted as 40.6% of the total variability in turbidity can be explained by the fitted regression model. g) See the Minitab output given in part a. Based on the t-tests, we conclude that the slope and intercept are nonzero. h) See the Minitab output in part a. Based on the analysis of variance, we can reject the null hypothesis and conclude that the regression is significant. i) β0: -510.7 ± 2.179(228.2); 13.45,1007.95 β1: 26.3 ± 2.179(9.178); 6.30,46.30 j) Residual Plots. 8 Residuals Versus T emperat (response is Turbidit) 10 0 Residu al 0 -10 0 20 21 22 23 24 25 26 27 Temperat Residuals Versus the Fitted Values (response is Turbidit) 10 0 Resid ual 0 -10 0 50 100 15 0 200 Fitted Valu e Normal Probability Plot of the Residuals (response is Turbidit) 2 1 Normal Score 0 -1 -2 -100 0 10 0 Resid ual 9 k) r = 0.638, P-value = 0.014, Therefore, we conclude there is a significant correlation between temperature and turbidity. 6-5. a) The regression equation is permeability = 40.6 - 2.12 strength Predictor Constant strength S = 1.038 Coef 40.5536 -2.1232 SE Coef 0.7509 0.2313 T 54.00 -9.18 P 0.000 0.000 R-Sq = 86.6% SS 90.759 13.999 104.757 R-Sq(adj) = 85.6% MS 90.759 1.077 F 84.28 P 0.000 Analysis of Variance Source DF Regression 1 Residual Error 13 Total 14 ˆ y = 40.6 − 2.12 x b) -0.97179 0.00066 1.56517 0.35430 0.21735 0.99417 0.79921 0.83762 0.24199 -1.22252 -0.49351 -0.38411 -0.74715 -2.15947 0.96808 c) SSE = 13.999 σ 2 = 1.077 ˆ ˆ ˆ d) se( β1 ) = 0.2313, se( β 0 ) = 0.7509 e) SST = 104.757 SSR = 90.759, SSE = 13.999, and SSR + SSE = 104.757 ∴ SST = SSR + SSE f) R2 = 86.6%. This is interpreted as 86.6% of the total variability in permeability can be explained by the fitted regression model. g) See the Minitab output given in part a. Based on the t-tests, we conclude that the slope and intercept are nonzero. h) See the Minitab output in part a. Based on the analysis of variance, we can reject the null hypothesis and conclude that the regression is significant. i) β0: 40.5536 ± 2.16(0.7509 ); 38.93, 41.18 β1: -2.1232 ± 2.16(0.2313 ); -2.62, -1.62 j) Residual Plots 10 Residuals Versus strength (res pons e is permeabi) 2 1 R e s id u a l 0 -1 -2 1 2 3 4 5 strength Residuals Versus the Fitted Values (res pons e is permeabi) 2 1 R e s id u a l 0 -1 -2 29 30 31 32 33 34 35 36 37 38 Fitted Value 11 No rmal Pro bability Plo t of the Residuals (res pons e is permeabi) 2 1 Normal Score 0 -1 -2 -2 -1 0 1 2 Residual k) r = -0.931, P-value = 0; Therefore, we conclude there is a significant correlation between strength and temperature. 6-6. a) The plot below implies that a simple linear regression seems reasonable in this situation. 9 8 7 6 5 y 4 3 2 1 0 60 70 80 90 100 x 12 b) The regression equation is y = - 10.1 + 0.174 x Predictor Constant x S = 1.318 Coef -10.132 0.17429 StDev 1.995 0.02383 T -5.08 7.31 P 0.000 0.000 R-Sq = 74.8% R-Sq(adj) = 73.4% Analysis of Variance Source DF Regression 1 Residual Error 18 Total 19 An estimate of σ 2 = 1.737 ˆ c) SS 92.934 31.266 124.200 MS 92.934 1.737 F 53.50 P 0.000 The predicted mean rise in blood pressure level associated with a sound pressure level of 85 decibels is 4.69 millimeters of mercury. ˆ y = −10.1 + 0.174(85) = 4.69 . 6-7. a) 0.055137 b) (0.054460, 0.055813) c) (0.053376, 0.056897) d) The prediction interval is wider than the confidence interval. a) 472.499 b) ( 471.183, 473.816) c) ( 467.975, 477.024) d) The prediction interval is wider than the confidence interval. a) 36.095 b) (35.059, 37.131) c) (32.802, 39.388) d) The prediction interval is wider than the confidence interval. a) 4.683 b) ( 4.055, 5.312) c) ( 1.844, 7.523) d) The prediction interval is wider than the confidence interval. 6-8. 6-9. 6-10. Section 6-3 6-11. a) The regression equation is y = 351 - 1.27 x1 - 0.154 x2 Predictor Constant x1 x2 S = 25.50 Coef 350.99 -1.272 -0.15390 SE Coef 74.75 1.169 0.08953 T 4.70 -1.09 -1.72 P 0.018 0.356 0.184 VIF 2.6 2.6 R-Sq = 86.2% R-Sq(adj) = 77.0% Analysis of Variance Source Regression Residual Error Total Source X1 DF 1 DF 2 3 5 SS 12161.6 1950.4 14112.0 Seq SS 10240.4 MS 6080.8 650.1 F 9.35 P 0.051 13 x2 1 1921.2 b) -24.9866 24.3075 11.8203 -20.4595 12.8296 -3.5113 c) SSE = 1950.4 σ 2 = 650.1 ˆ d) R-Sq = 86.2%, R-Sq(adj) = 77.0%; R-Sq(adj) is less than R-Sq. because the model contains terms that are not contributing significantly to the model. The adjusted R2 value will penalize the user for adding terms to the model that are not significant. e) See part a. Based on the p-value from the ANOVA table, the regression model is significant at the 0.10 level of significance. ˆ ˆ ˆ f) se( β 0 ) = 74.75, se( β1 ) = 1.169, se( β 2 ) = 0.08953 g) See part a. Based on the p-values for each coefficient, the regressors do not appear to be significant at the 0.05 level of significance. h) β0: 350.99 ± 3.182(74.74); 113.17, 588.81 β1: -1.272 ± 3.182(1.169); -4.99, 2.45 β2 : -0.1539 ± 3.182( 0.08953); -0.439, 0.131 i) Obs 1 2 3 4 5 6 SRES1 -1.65529 1.35770 0.51526 -1.05590 1.09375 -0.18436 COOK1 1.69265 0.63183 0.02083 0.27192 1.48548 0.00898 Residuals Versus x1 (res pons e is y) 30 20 10 R e s id u a l 0 - 10 - 20 - 30 0 10 20 30 40 x1 14 Residuals Versus the Fitted Values (res pons e is y) 30 20 10 R e s id u a l 0 - 10 - 20 - 30 100 150 200 Fitted Value No rmal Pro bability Plo t of the Residuals (res pons e is y) 1 Norma l Score 0 -1 - 30 - 20 - 10 0 10 20 30 Residual j) The VIFs are 2.6. There is no indication of a problem with multicollinearity. 6-12. a) The regression equation is MPG-y = 38.4 - 0.00165 Weight-x1 - 0.0403 Horsepower-x2 Predictor Constant Weight-x Coef 38.387 -0.001648 StDev 3.719 0.001325 T 10.32 -1.24 P 0.000 0.245 VIF 1.8 15 Horsepow S = 2.135 -0.040308 0.006299 -6.40 0.000 1.8 R-Sq = 91.2% R-Sq(adj) = 89.2% Analysis of Variance Source Regression Residual Error Total Source Weight-x Horsepow b) DF 2 9 11 SS 423.41 41.03 464.44 Seq SS 236.70 186.71 MS 211.70 4.56 F 46.44 P 0.000 DF 1 1 0.16464 -1.41661 2.33925 -1.31445 1.58629 -1.08273 1.12759 -3.77526 1.79269 0.63652 -2.14613 2.08818 c) SSE = 41.03 σ 2 = 4.56 ˆ d) R-Sq = 91.2%, R-Sq(adj) = 89.2% discuss R-Sq(adj) is less than R-Sq. e) see part a. Based on the p-value from the ANOVA table, the regression model is significant at the 0.10 level of significance. β β β f) se( 0 ) = 3.719, se( 1 ) = 0.0013, se( 2 ) = 0.0063 g) See part a. Based on the p-values for each coefficient, only x1 does not appear to be significant at the 0.05 level of significance. h) β0: 38.387 ± 2.262(3.791); 29.8118, 46.9622 β1: -0.0016 ± 2.262(0.0013); -0.0045, 0.0013 β2 : -0.0403 ± 2.262(0.0063); -0.0546,-0.0260 i) Obs SRES2 COOK2 1 0.09379 0.00141 2 -0.78074 0.07819 3 1.31029 0.24630 4 -0.64814 0.01519 5 0.80217 0.03558 6 -0.56461 0.02547 7 0.57213 0.01895 8 -1.84772 0.10479 9 0.97075 0.10581 10 0.35819 0.01897 11 -1.18468 0.18208 12 1.53073 1.13238 ˆ ˆ ˆ 16 Residuals Versus Horsepow (response is MPG-y) 3 2 1 Residu al 0 -1 -2 -3 -4 100 20 0 30 0 400 500 600 Horsep ow Residuals Versus W eight-x (response is MPG-y) 3 2 1 Resid ual 0 -1 -2 -3 -4 25 00 3 500 45 00 Weigh t-x Residuals Versus the Order of the Data (response is MPG-y) 3 2 1 Residual 0 -1 -2 -3 -4 2 4 6 8 10 12 Ob servation Ord er 17 Residuals Versus the Fitted Values (response is MPG-y) 3 2 1 Residu al 0 -1 -2 -3 -4 10 20 30 Fitted Valu e Norm al Probability Plot of the Residuals (response is MPG-y) 2 1 Normal Score 0 -1 -2 -4 -3 -2 -1 0 1 2 3 Resid ual j) The VIFs are 1.8. There is no indication of a problem with multicollinearity. 6-13. a) The regression equation is y = - 103 + 0.605 x1 + 8.92 x2 + 1.44 x3 + 0.014 x4 Predictor Constant x1 x2 x3 x4 S = 15.58 Coef -102.7 0.6054 8.924 1.437 0.0136 SE Coef 207.9 0.3689 5.301 2.392 0.7338 T -0.49 1.64 1.68 0.60 0.02 P 0.636 0.145 0.136 0.567 0.986 VIF 2.3 2.2 1.3 1.0 R-Sq = 74.5% SS 4957.2 1699.0 6656.3 R-Sq(adj) = 59.9% MS 1239.3 242.7 F 5.11 P 0.030 Analysis of Variance Source DF Regression 4 Residual Error 7 Total 11 18 Source x1 x2 x3 x4 DF 1 1 1 1 Seq SS 3758.9 1109.4 88.9 0.1 b) -18.7580 1.8862 23.3109 -8.9565 9.1852 6.6436 4.8136 -0.1568 -17.8502 -12.9376 6.6216 6.1980 c) SSE = 1699.0 , σ 2 = 242.7 ˆ d) R-Sq = 74.5% R-Sq(adj) = 59.9%; R-Sq(adj) is less than R-Sq. since there are terms in the model that are not significant. e) see part a. Based on the p-value from the ANOVA table, the regression model is significant at the 0.05 level of significance. ˆ ˆ ˆ ˆ ˆ f) se( β 0 ) = 207.9, se( β1 ) = 0.3689, se( β 2 ) = 5.301, se( β 3 ) = 2.392, se( β 4 ) = 0.7338 g) See part a. Based on the p-values for each coefficient, the regressors do not appear to be significant. h) β0: -102.7 ± 2.365(207.9); -594.38, 388.98 β1: 0.6054 ± 2.365( 0.3689); -0.267, 1.478 β2 : 8.924 ± 2.365(5.301); -3.613, 21.461 β3: 1.437 ± 2.365(2.392); -4.22, 7.094 β4: 0.0136 ± 2.365( 0.7338); -1.722, 1.75 i) Residuals Versus x1 (res pons e is y) 25 20 15 10 R e s id u a l 5 0 -5 - 10 - 15 - 20 25 35 45 55 65 75 85 x1 19 Residuals Versus the Fitted Values (res pons e is y) 25 20 15 10 R e s id u a l 5 0 -5 - 10 - 15 - 20 230 24 0 2 50 260 270 280 290 300 310 Fitted Value No rmal Pro bability Plo t of the Residuals (res pons e is y) 2 1 Norma l Score 0 -1 -2 - 20 - 15 - 10 -5 0 5 10 15 20 25 Residual k) The VIFs are all less than 10, there is no indication of a problem with multicollinearity. 6-14. a) The regression equation is HFE = 47.2 - 9.74 Emitter + 0.428 Base + 18.2 EtoB Predictor Constant Emitter Coef 47.17 -9.735 StDev 49.58 3.692 T 0.95 -2.64 P 0.356 0.018 VIF 6.6 20 Base EtoB S = 3.480 0.4283 18.237 0.2239 1.312 1.91 13.90 0.074 0.000 2.5 9.3 R-Sq = 99.4% R-Sq(adj) = 99.3% Analysis of Variance Source Regression Residual Error Total Source Emitter Base EtoB DF 1 1 1 DF 3 16 19 Seq SS 23959 4233 2340 SS 30532 194 30725 MS 10177 12 F 840.55 P 0.000 b) -0.90039 1.83266 -0.31872 -6.78384 -2.18117 -1.51602 1.90876 2.29305 2.01911 -5.96711 -2.21540 3.41999 3.16536 -0.57066 -1.96916 2.64163 -0.93420 6.68822 1.14227 -1.75438 c) SSE = 194 σ 2 = 12 ˆ d) R-Sq = 99.4%, R-Sq(adj) = 99.3%. R-Sq(adj) is almost equal to R-Sq. e) see part a. Based on the p-value from the ANOVA table, the regression model is significant at the 0.10 level of significance. ˆ ˆ β f) se( 0 ) = 49.58, se( β E ) = 3.692, se( β B ) = 0.0039, se( β EtoB ) = 0.2239 g) See part a. Based on the p-values for each coefficient, only the Emitter and EtoB appear to be significant at the 0.05 level of significance. h) β0: 47.17 ± 2.120(49.58); -57.9396,152.2796 βE: -9.735 ± 2.120(3.692); -17.5620,-1.9080 βB : 0.4283 ± 2.120(0.2239); -0.0464,0.9030 βEtoB : 18.237 ± 2.120(1.312); 15.4556,21.0184 i) SRES COOK -0.27777 0.002938 0.59321 0.023627 -0.10665 0.001012 -2.08750 0.159577 -0.67100 0.016418 -0.45280 0.004106 0.63705 0.035375 0.68266 0.008518 0.67761 0.041745 -1.77388 0.055072 ˆ 21 ˆ -0.68134 1.12620 0.93545 -0.17625 -0.59731 0.86378 -0.40464 2.16785 0.54024 -0.53676 0.016855 0.099229 0.012570 0.001204 0.010172 0.054946 0.052052 0.319631 0.124644 0.009606 Residuals Versus EtoB (response is HFE) 5 Residu al 0 -5 3 4 5 6 7 8 9 10 11 EtoB Residuals Versus Base (response is HFE) 5 Resid ual 0 -5 2 20 23 0 24 0 Base 22 Residuals Versus Em itter (response is HFE) 5 Residu al 0 -5 14 15 16 Emitter Residuals Versus the Order of the Data (response is HFE) 5 Residu al 0 -5 2 4 6 8 10 12 14 16 18 20 Ob servation Ord er Residuals Versus the Fitted Values (response is HFE) 5 Resid ual 0 -5 50 100 1 50 200 Fitted Valu e 23 Norm al Probability Plot of the Residuals (response is HFE) 2 1 Normal Score 0 -1 -2 -5 0 5 Resid ual j) All the VIFs are less than 10. There is no indication of a problem with multicollinearity. 6-15. a) 149.9 b) (85.1, 214.7) c) (-12.5, 312.3) d) The prediction interval is wider than the confidence interval. a) 29.182 b) ( 26.631, 31.733) c) ( 23.719, 34.644) d) The prediction interval is wider than the confidence interval. a) 287.56 b) (263.77, 311.35) c) (243.69, 331.44) d) The prediction interval is wider than the confidence interval. a) 91.424. Note also that the values of x’s are away from the center, particularly x2 = 220. b) ( 85.953, 96.895) c) ( 83.249, 99.599) d) The prediction interval is wider than the confidence interval. 6-16. 6-17. 6-18. Section 6-4 6-19. a) The regression equation is y = 643 + 11.4 x1 - 0.933 x2 - 0.0106 x1x2 - 0.0272 x1^2 +0.000471 x2^2 Predictor Constant x1 x2 x1x2 x1^2 x2^2 Coef 642.685 11.3862 -0.933346 -0.0106334 -0.0271620 0.00047076 StDev 0.000 0.0000 0.000000 0.0000000 0.0000000 0.00000000 T * * * * * * P * * * * * * VIF 2675.3 1283.4 8342.1 502.4 3301.5 S=* Analysis of Variance Source Regression Residual Error Total DF 5 0 5 SS 14112.00 * 14112.00 MS 2822.40 * F * P * 24 Source x1 x2 x1x2 x1^2 x2^2 DF 1 1 1 1 1 Seq SS 10240.37 1921.21 827.86 1056.39 66.17 b) Because VIF’s are much greater than 10, we would suspect that the multicollinearity present in the all secondorder model. c) Because SSE(Full Model) is not available the test statistic can not be computed. 6-20. HP^2 Predictor Constant W HP W*HP W^2 HP^2 S = 1.977 Coef 53.49 -0.00736 -0.10098 0.00000146 0.00000104 0.00007854 StDev 27.17 0.02063 0.08450 0.00002559 0.00000371 0.00005398 T 1.97 -0.36 -1.19 0.06 0.28 1.46 P 0.096 0.733 0.277 0.956 0.788 0.196 VIF 496.2 368.3 639.0 767.9 65.7 a) The regression equation is MPG = 53.5 - 0.0074 W - 0.101 HP +0.000001 W*HP +0.000001 W^2 +0.000079 R-Sq = 95.0% R-Sq(adj) = 90.7% Analysis of Variance Source Regression Residual Error Total Source W HP W*HP W^2 HP^2 DF 1 1 1 1 1 DF 5 6 11 SS 440.988 23.449 464.438 Seq SS 236.699 186.707 9.307 0.001 8.274 MS 88.198 3.908 F 22.57 P 0.001 b) Because VIF’s are much greater than 10, we would suspect that the multicollinearity present in the all secondorder model. c) The regression equation is MPG = 38.4 - 0.00165 W - 0.0403 HP Predictor Constant W HP S = 2.135 Coef 38.387 -0.001648 -0.040308 StDev 3.719 0.001325 0.006299 T 10.32 -1.24 -6.40 P 0.000 0.245 0.000 VIF 1.8 1.8 R-Sq = 91.2% R-Sq(adj) = 89.2% Analysis of Variance Source Regression Residual Error Total DF 2 9 11 = 5.8603 3.908 SS 423.41 41.03 464.44 = 1.4995 MS 211.70 4.56 F 46.44 P 0.000 ∴ f0 = [41.03 − 23.449] /(9 − 6) 23.449 / 6 25 This results in P-value = P(f 3,6 > 1.4995) = 0.3073. Because P-value is larger than 0.05, we fail to reject H0 and conclude that the second-order terms do not significantly contribute to the model. Actually we know from the ttests of the full second-order model in part a). 6-21. a) All possible regression. Response is y Vars 1 1 2 2 3 3 4 R-Sq 64.5 56.5 73.1 64.6 74.5 73.2 74.5 Adj. R-Sq 60.9 52.1 67.2 56.8 64.9 63.1 59.9 C-p 1.7 3.9 1.4 3.7 3.0 3.4 5.0 s 15.381 17.022 14.095 16.173 14.573 14.944 15.579 xxxx 1234 X X XX XX XXX XX X XXXX b) Forward selection. Response is Step Constant x2 T-Value P-Value x1 T-Value P-Value S R-Sq R-Sq(adj) C-p 15.4 64.46 60.90 1.7 y Alpha-to-Enter: 0.25 on 4 predictors, with N = 12 1 -90.1607 15.2 4.26 0.002 2 0.5287 10.3 2.36 0.042 0.50 1.71 0.122 14.1 73.14 67.17 1.4 Alpha-to-Remove: 0.1 on 4 predictors, with N = 3 4 0.5287 -90.1607 0.50 1.71 0.122 10.3 2.36 0.042 15.2 4.26 0.002 12 c) Backward elimination. Response is Step Constant x1 T-Value P-Value x2 T-Value P-Value x3 T-Value P-Value x4 T-Value y 1 2 -102.7132-101.6100 0.61 1.64 0.145 8.9 1.68 0.136 1.4 0.60 0.567 0.01 0.02 0.61 1.76 0.117 8.9 1.80 0.109 1.4 0.65 0.536 26 P-Value S R-Sq R-Sq(adj) C-p 0.986 15.6 74.47 59.89 5.0 14.6 74.47 64.90 3.0 14.1 73.14 67.17 1.4 15.4 64.46 60.90 1.7 d) Model contains only x1 and x2 seems to be the “best” among all, in the sense that it has high R-Sq(adj) and small Cp value. 6-22. a) All possible regression. Response is y Vars 1 1 2 2 3 b) Forward R-Sq R-Sq(adj) C-p S xxx 123 X X X X XX XXX 20 99.1 78.0 99.2 99.1 99.4 selection. y 1 -23.62 21.51 44.28 0.000 99.0 7.1 3.9403 76.8 542.9 19.389 99.1 5.7 3.7418 99.0 9.0 4.0433 99.3 4.0 3.4796 Alpha-to-Enter: 0.25 on 3 predictors, with N = 3 47.17 18.24 13.90 0.000 -9.7 -2.64 0.018 0.43 1.91 0.074 Response is Step Constant x3 T-Value P-Value x1 T-Value P-Value x2 T-Value P-Value S R-Sq R-Sq(adj) C-p 2 66.13 20.12 21.59 0.000 -5.4 -1.72 0.103 3.94 99.09 99.04 7.1 3.74 99.23 99.13 5.7 3.48 99.37 99.25 4.0 c) Backward elimination. Response is Step Constant x1 T-Value P-Value x2 T-Value P-Value y 1 47.17 -9.7 -2.64 0.018 0.43 1.91 0.074 Alpha-to-Remove: 0.1 on 3 predictors, with N = 20 27 x3 T-Value P-Value S R-Sq R-Sq(adj) C-p 18.2 13.90 0.000 3.48 99.37 99.25 4.0 d) Model contains all first-order terms seems to be the “best” among all, in the sense that it has highest R-Sq(adj) and smallest Cp value. 6-23. a) Note that x2 = 0 if using tool type 302 and x2 = 1 if using tool type 416. The regression equation is y = 14.3 + 0.141 x1 - 13.3 x2 Predictor Constant x1 x2 S = 0.6771 Coef 14.276 0.141150 -13.2802 SE Coef 2.091 0.008833 0.3029 T 6.83 15.98 -43.85 P 0.000 0.000 0.000 R-Sq = 99.2% R-Sq(adj) = 99.1% Analysis of Variance Source Regression Residual Error Total Source x1 x2 DF 1 1 DF 2 17 19 SS 1012.06 7.79 1019.85 Seq SS 130.61 881.45 Fit 36.001 SE Fit 0.244 Residual 1.519 St Resid 2.40R MS 506.03 0.46 F 1103.69 P 0.000 Unusual Observations Obs x1 y 13 248 37.520 R denotes an observation with a large standardized residual The regression model is significant at 0.01. b) Regression model for tool type 302 is The regression equation is y-Tool1 = 11.5 + 0.153 Tool1 Predictor Constant Tool1 S = 0.3749 Coef 11.503 0.152926 SE Coef 1.474 0.006237 T 7.81 24.52 P 0.000 0.000 R-Sq = 98.7% R-Sq(adj) = 98.5% Analysis of Variance Source Regression Residual Error Total DF 1 8 9 SS 84.483 1.124 85.607 MS 84.483 0.141 F 601.18 P 0.000 The regression model for tool type 302 is significant at 0.01. 28 Regression model for tool type 416 is The regression equation is y-Tool2 = 5.41 + 0.122 Tool2 Predictor Constant Tool2 S = 0.8193 Coef 5.409 0.12236 SE Coef 4.051 0.01722 T 1.33 7.11 P 0.219 0.000 R-Sq = 86.3% R-Sq(adj) = 84.6% Analysis of Variance Source Regression Residual Error Total DF 1 8 9 SS 33.889 5.370 39.258 Fit 35.753 MS 33.889 0.671 F 50.49 P 0.000 Unusual Observations Obs Tool2 y-Tool2 3 248 37.520 SE Fit 0.345 Residual 1.767 St Resid 2.38R R denotes an observation with a large standardized residual The regression model for tool type 416 is significant at 0.01. Supplemental Exercises 6-24. a) 2.7 2.6 kWh 2.5 2.4 2.3 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 Dollars No, a straight line relationship does not seem plausible. b) The regression equation is kWh = 2.43 + 0.012 Dollars c) Analysis of Variance Source Regression Residual Error Total 2) H 0 : β 1 = 0 DF 1 13 14 SS 0.00015 0.26369 0.26384 MS 0.00015 0.02028 F 0.01 P 0.933 29 3) H 1 : β 1 ≠ 0 4) α = 0.05 5) The test statistic is f0 = SSR / k SS E / ( n − p) 6) Reject H0 if f0 > fα,1,8 where f0.05,1,13 = 4.67 7) Using the results from the ANOVA table f0 = 0.00015 / 1 = 0.01 0.2637 / 13 8) Since 0.01 < 4.67 do not reject H0 and conclude that the regression model is not significant at α = 0.05. P-value > 0.10 (from the computer output the P-value is found to be 0.933 d) Predictor Constant Dollars Coef 2.4299 0.0119 StDev 0.6094 0.1389 T 3.99 0.09 P 0.002 0.933 0.0119 − t 0.025,13 (0.1389) ≤ β1 ≤ 0.0119 + t 0.025,13 (0.1389) 0.0119 − 2.160(0.1389) ≤ β1 ≤ 0.0119 + 2.160(0.1389) − 0.288 ≤ β1 ≤ 0.312 e) 2) H 0 : β1 = 0 3) H1 : β1 ≠ 0 4) α = 0.05 5) The test statistic is t0 = ˆ β1 ˆ se(β1 ) 6) Reject H0 if t0 < −tα/2,n-2 where −t0.025,13 = −2.160 or t0 > t0.025,13 = 2.160 7) Using the results from the table above t0 = 0.0119 = 0.0857 0.1389 8) Since –2.160 < 0.0857 < 2.160 do not reject H 0 and conclude the slope is practically 0. Dollars is not a significant predictor of electrical usage at α = 0.05. f) 2) H 0 : β0 = 0 3) H1 : β0 ≠ 0 4) α = 0.05 5) The test statistic is t0 = ˆ β0 ˆ se(β0 ) 6) Reject H0 if t0 < −tα/2,n-2 where −t0.025,13 = −2.160 or t0 > t0.025,13 = 2.160 7) Using the results from the table above t0 = 2.4299 = 3.987 0.6094 30 8) Since 3.987 > 2.160 reject H 0 and conclude the intercept is not zero at α = 0.05. 6-25. 2 Using R = 1 − SSE Syy , Syy − SS E SS E n−2 F0 = Also, SS ( n − 2) 1 − S E yy SS E S yy = = S yy − SS E σ2 SS E = ∑ ( yi − β 0 − β1xi ) 2 = ( y − y − β ( x − x)) 2 ∑ =∑ =∑ i 1 i ( yi − y) + β12 ∑ ( x i − x ) 2 − 2β1 ∑ ( yi − y)( x i − x) ( yi − y) 2 − β12 ∑ ( x i − x) 2 2 β1 σ 2 / S xx SST − SS E = β12 ∑ ( xi − x ) 2 Therefore, F0 = = t2 0 Because the square of a t random variable with n-2 degrees of freedom is an F random variable with 1 and n-2 degrees of freedom, the usual t-test that compares | t 0 | to t α / 2 ,n −2 is equivalent to comparing f 0 = t 2 to f α ,1,n −2 = t α / 2,n −2 . 0 6-26. a) From Exercise 6-25, f 0 = Reject H 0 : β1 = 0 . 0.9( 23) = 207 . 1 − 0.9 23R 2 > 4.28 . b) Because f0.05,1,23 = 4.28, H 0 is rejected if 1 − R2 That is, H 0 is rejected if 23R 2 > 4.28(1 − R 2 ) 27.28R 2 > 4.28 R 2 > 0.157 6-27. For two random variables X 1 and X 2 , V( X1 + X 2 ) = V( X1 ) + V( X 2 ) + 2Cov( X1 , X 2 ) Then, V( Yi − Yi ) = V(Yi ) + V(Yi ) − 2Cov( Yi , Yi ) ( x − x)2 = σ 2 + V(β0 + β1xi ) − 2σ 2 1 + iS n xx 1 ( xi − x)2 1 ( xi − x)2 2 = σ2 + σ2 n + S − 2σ n + S xx xx 2 2 1 + ( x i − x) ) = σ 1 − ( n S xx a) Because ei is divided by an estimate of its standard error (when σ2 is estimated by σ 2 ), ri has approximate unit standard deviation. b) No, the term in brackets in the denominator is necessary for the standardized residuals to have unit standard deviation. c) If xi is near x and n is reasonably large, ri is approximately equal to the standardized residual. d) If xi is far from x , the standard error of ei is small. Consequently, extreme points are better fit by least squares regression than points near the middle range of x. Because the studentized residual at any point has variance of 31 approximately one, the studentized residuals can be used to compare the fit of points to the regression line over the range of x. 6-28. a) 2 D C ou t pu t 1 0 2 3 4 5 6 7 8 9 10 W indVel Scatter diagram shows definite curvature. So, a higher order model may be appropriate or a transformation of variables. b) The regression equation is DCoutput = 0.131 + 0.241 WindVel Predictor Constant WindVel S = 0.2361 Coef 0.1309 0.24115 SE Coef 0.1260 0.01905 T 1.04 12.66 P 0.310 0.000 R-Sq = 87.4% R-Sq(adj) = 86.9% ˆ y = 0.131 + 0.241x c) Analysis of Variance Source DF Regression 1 Residual Error 23 Total 24 2) H 0 : β 1 = 0 3) H 1 : β 1 ≠ 0 4) α = 0.05 5) The test statistic is f0 = SSR / k SS E / ( n − p) SS 8.9296 1.2816 10.2112 MS 8.9296 0.0557 F 160.26 P 0.000 6) Reject H0 if f0 > fα,1,23 where f0.05,1,23 = 4.28 7) Using the results from the ANOVA table f0 = 8.92961 / 1 = 160.257 1.28157 / 23 8) Since 160.257 > 4.28 reject H0 and conclude that the regression model is significant at α = 0.05. 32 d) Residual Plot for DC Output 0.4 0.2 0 -0.2 -0.4 -0.6 0.7 1.1 1.5 1.9 2.3 2.7 Residual Plot for DC Output 0.4 0.2 Residuals 0 -0.2 -0.4 -0.6 0 2 4 6 8 10 12 Residuals Predicted Wind Velocity Conclude plots indicate model inadequacy since the residual plots exhibit nonrandom patterns. e) Examining the residual plots in part d), a transformation on the x-variable, y-variable, or both would be appropriate. A simple linear regression of y on the transformed variable 1/x may be satisfactory. f) The following analysis employs the transformed variable, 1/x The regression equation is DCoutput = 2.98 - 6.93 1/WindVel Predictor Constant 1/WindVe S = 0.09417 Coef 2.97886 -6.9345 SE Coef 0.04490 0.2064 T 66.34 -33.59 P 0.000 0.000 R-Sq = 98.0% R-Sq(adj) = 97.9% ˆ y = 2.98 − 6.93x ∗ where x* = 1/x Analysis of Variance Source Regression Residual Error Total 2) H 0 : β 1 = 0 3) H 1 : β 1 ≠ 0 4) α = 0.05 5) The test statistic is f0 = SSR / k SS E / ( n − p) DF 1 23 24 SS 10.007 0.204 10.211 MS 10.007 0.009 F 1128.43 P 0.000 6) Reject H0 if f0 > fα,1,23 where f0.05,1,23 = 4.28 7) Using the results from the ANOVA table f0 = 10.0072 / 1 = 1128.43 0.203970 / 23 33 8) Since 1128.43 > 4.28 reject H0 and conclude that the regression model is significant at α = 0.05. Residual Plot for DC Output 0.19 Residual Plot for DC Output 0.19 0.09 Residuals 0.09 Residuals 0 0.4 0.8 1.2 1.6 2 2.4 -0.01 -0.01 -0.11 -0.11 -0.21 -0.21 0 0.1 0.2 0.3 0.4 0.5 Predicted Using 1/WindVel 1/WindVel is 6-29. Conclude from the random appearance of the residuals in the plots and significance of regression that the model adequate. The transformation, 1/(Wind Velocity), appears to be satisfactory as a regressor of Output. a) p = k + 1 = 2 + 1 = 3 Average size = p/n = 3/25 = 0.12 b) Leverage point criteria: h ii > 2( p / n) h ii > 2(0.12) h ii > 0.24 h17 ,17 = 0.2593 h18,18 = 0.2929 Points 17 and 18 are leverage points 6-30. a) The regression equation is y = 3829 - 0.215 x3 + 21.2 x4 + 1.66 x5 Predictor Constant x3 x4 x5 S = 43.66 Coef 3829 -0.2149 21.2134 1.6566 SE Coef 2262 0.1088 0.9050 0.5502 T 1.69 -1.97 23.44 3.01 P 0.099 0.056 0.000 0.005 R-Sq = 99.3% R-Sq(adj) = 99.3% y = 3829.26 − 0.215x 3 +21.213x 4 + 1.657 x 5 b) Analysis of Variance Source DF Regression 3 Residual Error 36 Total 39 H 0 :β 3 = β 4 = β5 = 0 2) 3) H 1: β j ≠ 0 for at least one j 4) α = 0.01 SS 9863398 68638 9932036 MS 3287799 1907 F 1724.42 P 0.000 34 5) The test statistic is f0 = SSR / k SS E / ( n − p) 6) Reject H0 if f0 > fα,3,36 where f0.01,3,36 = 4.38 7) Using the results from the ANOVA table f0 = 9863398 / 3 = 1724.42 68638.2 / 36 8) Since 1724.42 > 4.38 reject H0 and conclude that the regression model is significant at α = 0.01. P-value < 0.00001 c) All at α = 0.01 H 0 :β 3 = 0 H 1: β 3 ≠ 0 t0 = -1.97 | t 0 | > t α / 2 ,36 / Do not reject H 0 t0.005,36 = 2.72 H 0 :β 4 = 0 H 1: β 4 ≠ 0 t0 =23.44 | t 0 | > t α / 2 ,36 Reject H 0 H 0 :β5 = 0 H 1: β 5 ≠ 0 t0 = 3.01 | t 0 | > t α / 2 ,36 Reject H 0 Do not need x3 term in the model at α = 0.01. d) R2 = 0.993 R 2 = 0.9925 adj The slight decrease in R 2 may be reflective of the insignificant x3 term. adj e) Normal Probability Plot 99.9 99 95 cumulative percent 80 50 20 5 1 0.1 -80 -50 -20 10 40 70 100 Residuals Normality assumption appears reasonable. This is evident by the fact the residuals fall along a straight line. f) 35 Residual Plot 100 70 40 Residuals 10 -20 -50 -80 3 3.3 3.6 3.9 4.2 4.5 4.8 (X 1000) Predicted Plot is satisfactory. There does not appear to be a nonrandom pattern in the residual vs. predicted plot. g) Residual Plot for y 100 70 40 10 -20 -50 -80 28 28.4 28.8 29.2 29.6 x3 30 30.4 (X 1000) Slight indication that variance increases as x3 increases. This is evident by the “fanning out” appearance of the residuals. h) Using the equation found in part a): y = 3829.26 − 0.215(1670) + 21.213(170) + 1.657(1589) = 9709.39 6-31. a) The regression equation is y* = 19.7 - 1.27 x3* + 0.00541 x4 +0.000408 x5 Predictor Constant x3* x4 x5 S = 0.01314 Coef 19.690 -1.2673 0.0054140 0.0004079 SE Coef 9.587 0.9594 0.0002711 0.0001645 T 2.05 -1.32 19.97 2.48 P 0.047 0.195 0.000 0.018 R-Sq = 99.1% Residuals R-Sq(adj) = 99.0% Analysis of Variance 36 Source Regression Residual Error Total 2) H 0:β∗ = β4 = β5 = 0 3 DF 3 36 39 SS 0.68611 0.00622 0.69233 MS 0.22870 0.00017 F 1323.62 P 0.000 3) H 1: β j ≠ 0 for at least one j 4) α = 0.01 5) The test statistic is f0 = SSR / k SS E / ( n − p) 6) Reject H0 if f0 > fα,3,36 where f0.01,3,36 = 4.38 7) Using the results from the ANOVA table f0 = 0.686112 / 3 = 1323.62 0.00622033 / 36 8) Since 1323.62 > 4.38 reject H0 and conclude that the regression model is significant at α = 0.01. P-value < 0.00001 b) α = 0.01 H 0 : β∗ = 0 3 H1:β∗ ≠ 0 3 t 0 = −1.32 | t 0 | > t α / 2 ,36 / Do not reject H 0 t .005,36 = 2.72 H 0 :β 4 = 0 H 1: β 4 ≠ 0 t 0 = 19.97 | t 0 | > t α / 2 ,36 Reject H 0 H 0 :β5 = 0 H 1: β 5 ≠ 0 t 0 = 2.48 | t 0 | > t α / 2 ,36 / Do not reject H 0 β3 : Do not reject H0 and conclude that ln(x3) is not a significant regressor in the model at α = 0.01. β4 : Reject H0 and conclude that x4 is a significant regressor in the model at α = 0.01. β5 : Do not reject H0 and conclude that x5 is not a significant regressor in the model at α = 0.01. c) Residual Plot for ln(y) (X 1E-3) 38 28 (X 1E-3) 38 28 18 8 -2 -12 -22 Residual Plot for ln(y) Residuals 8 -2 -12 -22 8 8.1 8.2 8.3 8.4 8.5 Residuals 18 1026 1027 1028 1029 1030 1031 1032 ln(x3) (X 0.01) Predicted Curvature is evident in the residuals plots from this model, whereas non-stable variance was evident in previous model. 37 6-32. a) The regression equation is y = - 1709 + 2.02 x -0.000593 x^2 Predictor Coef Constant -1709.4 x 2.0229 x^2 -0.00059293 S = 0.2101 SE Coef 244.8 0.2798 0.00007994 T -6.98 7.23 -7.42 P 0.000 0.000 0.000 R-Sq = 98.8% R-Sq(adj) = 98.5% y = −1709.4054 + 2.0229 x − 0.0006x 2 b) Analysis of Variance Source Regression Residual Error Total 2) H 0 : β1 = β11 = 0 3) H 1: β j ≠ 0 4) α = 0.05 5) The test statistic is f0 = SSR / k SS E / ( n − p) DF 2 7 9 SS 26.487 0.309 26.796 MS 13.244 0.044 F 300.11 P 0.000 6) Reject H0 if f0 > fα,3,36 where f0.05,2,7 = 4.74 7) Using the results from the ANOVA table f0 = 26.4871 / 2 = 300.11 0.308899 / 7 8) Since 300.11 > 4.74 reject H0 and conclude that the regression model is significant at α = 0.05. c) 2) H 0 : β11 = 0 3) H 1: β11 ≠ 0 4) α = 0.05 5) The test statistic is t0 = β1 − β1,0 se(β ) 1 6) Reject H0 if t0 < −tα/2,n-2 where −t0.025,7 = −2.365 or t0 > t0.025,7 = 2.365 7) Using the results from the table given in part a) t0 = −0.000593 − 0 = −7.417 0.00008 8) Since −7.417 < −2.365 reject H 0 and conclude the quadratic term contributes significantly to the model at α = 0.05. d) 38 Residual Plot for y 0.32 0.22 0.12 Residuals 0.02 -0.08 -0.18 -0.28 11 12 13 14 15 16 Predicted Some indication of nonconstant variance. This is evident by the widening of the residuals as the predicted value increases. e) Normal Probability Plot 99.9 99 cumulative percent 95 80 50 20 5 1 0.1 -0.28 -0.18 -0.08 0.02 0.12 0.22 0.32 Residuals Normality assumption is reasonable. This is evident by the fact that the residuals fall along a straight line. 6- 33. a) 39 Plot of y vs x 40 30 20 y 10 0 0 0.3 0.6 0.9 x 1.2 1.5 1.8 The simple linear regression model seems appropriate. This is evident by the fact that the data fall along a straight line as x and y increase. b) The regression equation is y = 0.47 + 20.6 x Predictor Constant x S = 3.716 Coef 0.470 20.567 SE Coef 1.936 2.142 T 0.24 9.60 P 0.811 0.000 R-Sq = 85.2% R-Sq(adj) = 84.3% Analysis of Variance Source Regression Residual Error Total y = 0.4705 + 20.5673x c) y = 0.470467 + 20.5673(1) = 21.038 d) y = 0.470467 + 20.5673( 0.47) = 10.1371 e i = y i − y i = 11.8 − 10.1371 = 1.6629 e) The least squares estimate minimizes ∑ ( y i − βxi ) 2 . Upon setting the derivative equal to zero, we obtain 2 ( y − βx )( − x ) = 2[ y x − β x 2 ] = 0 DF 1 16 17 SS 1273.5 220.9 1494.5 MS 1273.5 13.8 F 92.22 P 0.000 ∑ i i i ∑ ii ∑ i Solving for β , ∑ yi xi . β= ∑ xi2 f) The regression equation is y = 21.0 x Predictor Noconstant x Coef 21.0315 SE Coef 0.9418 T 22.33 P 0.000 40 S = 3.612 Analysis of Variance Source Regression Residual Error Total y = 21.031461x DF 1 17 18 SS 6505.4 221.8 6727.1 MS 6505.4 13.0 F 498.69 P 0.000 Observed Values for Fitted Model 40 Fitted Observed 30 20 y 10 0 0 0.3 0.6 0.9 x 1.2 1.5 1.8 Examining the plot, the model seems very appropriate - possibly a better fit. β 6-34. a) σ 2 has a t distribution with n − 1 degrees of freedom. ∑ x2 i b) From Exercise 6-33f, β = 21.031461, σ = 3.611768, and ∑ x 2 = 14.7073 . i Therefore, 21.031461 t0 = = 22.3314 and H 0 : β = 0 is rejected at usual α values. 3.611768 14.7073 6-35. a) 41 Plot of Strength vs Age 2800 2600 2400 Strength 2200 2000 1800 1600 0 5 10 Age 15 20 25 A straight-line regression model seems appropriate. This is evident by the fact that the data fall along a straight line as age and strength increase. b) The regression equation is strength = 2625 - 37.0 age Predictor Constant age S = 99.05 Coef 2625.39 -36.962 SE Coef 45.35 2.967 T 57.90 -12.46 P 0.000 0.000 R-Sq = 89.6% R-Sq(adj) = 89.0% Analysis of Variance Source Regression Residual Error Total y = 2625.39 − 36.9618x c) y = 2625.39 − 36.9618(20) = 1886.154 d) Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 age 15.5 23.8 8.0 17.0 5.0 19.0 24.0 2.5 7.5 11.0 13.0 3.8 25.0 9.8 22.0 18.0 strength 2158.7 1678.2 2316.0 2061.3 2207.5 1708.3 1784.7 2575.0 2357.9 2277.7 2165.2 2399.6 1779.8 2336.8 1765.3 2053.5 Fit 2052.5 1747.5 2329.7 1997.0 2440.6 1923.1 1738.3 2533.0 2348.2 2218.8 2144.9 2486.8 1701.3 2265.0 1812.2 1960.1 SE Fit 23.1 38.0 27.2 24.7 33.2 27.8 38.6 39.0 28.1 23.2 22.2 36.1 41.1 24.6 33.9 26.1 Residual 106.2 -69.4 -13.7 64.3 -233.1 -214.8 46.4 42.0 9.7 58.9 20.3 -87.2 78.5 71.7 -46.9 93.4 St Resid 1.10 -0.76 -0.14 0.67 -2.50R -2.26R 0.51 0.46 0.10 0.61 0.21 -0.95 0.87 0.75 -0.50 0.98 DF 1 18 19 SS 1522819 176602 1699421 MS 1522819 9811 F 155.21 P 0.000 42 17 18 19 20 6.0 12.5 2.0 21.5 2414.4 2200.5 2654.2 1753.7 2403.6 2163.4 2551.5 1830.7 Plot of Strength 31.1 22.3 40.3 32.8 10.8 37.1 102.7 -77.0 0.11 0.38 1.14 -0.82 2800 2600 2400 observed 2200 2000 1800 1600 1600 1800 2000 2200 2400 2600 2800 predicted If there were no error, the values would all lie along the 45 axis. YES, the plot indicates age was reasonable regressor variable 6-36. a) Plot of Strength vs z=x-xbar 2800 2600 2400 Strength Plot of Strength vs Age 2800 2600 2400 Strength 2200 2000 1800 1600 2200 2000 1800 1600 -12 -8 -4 0 z 4 8 12 0 5 10 Age 15 20 25 The slopes of both regression models will be the same, but the intercept will be shifted. b) The regression equation is y = 2132 - 37.0 z Predictor Constant z S = 99.05 y = 2132.41 − 36.9618x Coef 2132.41 -36.962 SE Coef 22.15 2.967 T 96.28 -12.46 P 0.000 0.000 R-Sq = 89.6% R-Sq(adj) = 89.0% 43 β 0 = 2625.39 vs. β = −36.9618 1 β∗ = 2132.41 0 ∗ = −36.9618 β1 * Y = β* + β1 z + ε is now the average 0 Since the data is shifted by the average age, the intercept in the model strength. t0 = β1 σ 2 / S xx 6-37. After the transformation ∗ b β1 = β 1 , a S ∗ = a 2 S xx , xx x ∗ = a x , β ∗ = bβ 0 , 0 and σ ∗ = bσ . ∗ Therefore, t 0 = bβ1 / a ( bσ ) 2 / a 2S xx = ( b / a )β1 ( b / a ) σ 2 S xx = β1 σ 2 Sxx = t0 . 6-38. 2) H 0 : β1 = 10 3) H 1: β 1 ≠ 10 4) α = 0.01 5) The test statistic is t0 = β1 − β1,0 se(β ) 1 6) Reject H0 if t0 < −tα/2,n-2 where −t0.005,10 = −3.17 or t0 > t0.005,10 = 3.17 7) Using the results from Exercise 6-2 t0 = 9.21 − 10 = −23.37 0.0338 8) Since −23.37 < −3.17 reject H 0 and conclude the coefficient of the regressor is significantly different from 10 at α = 0.01. P-value = 0. 6-39. a) All possible regression. Response is y Vars 1 1 2 2 3 3 4 4 5 5 R-Sq 99.0 99.0 99.6 99.3 99.7 99.7 99.7 99.7 99.8 99.8 R-Sq(adj) 99.0 99.0 99.5 99.3 99.7 99.7 99.7 99.7 99.7 99.7 C-p 101.8 104.7 26.7 62.9 7.6 7.9 5.6 6.9 5.6 7.1 S 50.486 51.010 33.941 42.899 27.791 27.911 26.725 27.205 26.362 26.916 xxxxxx 123456 X X X X X X X X X XX X X X X X X XX XXX X X X X X X X X 44 6 99.8 99.7 7.0 26.509 XXXXXX b) Forward selection. Response is Step Constant x4 T-Value P-Value x1 T-Value P-Value x6 T-Value P-Value x5 T-Value P-Value x3 T-Value P-Value S R-Sq R-Sq(adj) C-p 50.5 99.02 99.00 101.8 y 1 164.99 21.43 62.12 0.000 Alpha-to-Enter: 0.25 on 6 predictors, with N = 3 1105.93 -0.06 -0.02 0.986 1.99 6.75 0.000 -8.1 -4.64 0.000 40 2 218.71 10.93 4.09 0.000 0.98 3.95 0.000 4 5 37.62 -3982.11 4.61 1.70 0.098 1.24 4.42 0.000 -13.0 -7.54 0.000 1.26 4.75 0.000 3.75 1.40 0.169 1.10 3.87 0.000 -16.3 -6.60 0.000 0.83 2.39 0.022 0.18 1.81 0.079 42.9 99.31 99.28 62.9 34.4 99.57 99.54 28.7 27.2 99.74 99.71 6.9 26.4 99.76 99.73 5.6 c) Backward elimination. Response is Step Constant x1 T-Value P-Value x2 T-Value P-Value x3 T-Value P-Value x4 T-Value P-Value x5 T-Value P-Value y 1 -4738 1.12 3.90 0.000 -0.030 -0.79 0.435 0.23 1.95 0.059 3.8 1.43 0.161 0.82 2.34 0.025 Alpha-to-Remove: 0.1 on 6 predictors, with N = 2 -3982 1.10 3.87 0.000 3 -4280 1.44 10.11 0.000 40 0.18 1.81 0.079 3.7 1.40 0.169 0.83 2.39 0.022 0.21 2.07 0.046 0.65 1.98 0.055 45 x6 T-Value P-Value S R-Sq R-Sq(adj) C-p -16.9 -6.47 0.000 26.5 99.77 99.72 7.0 -16.3 -6.60 0.000 26.4 99.76 99.73 5.6 -17.5 -7.50 0.000 26.7 99.75 99.72 5.6 d) Model contains only x1, x3, x5, and x6 seems to be the “best” among all, in the sense that it has high R-Sq(adj) and small Cp value with the simple model (only four regressors). 6-40. a) All possible regression. Response is y* Vars 1 1 2 2 3 3 4 4 5 5 6 R-Sq 98.8 98.5 99.1 99.1 99.5 99.4 99.5 99.5 99.5 99.5 99.5 selection. y* 1 7.275 0.00565 54.80 0.000 R-Sq(adj) 98.7 98.5 99.0 99.0 99.4 99.4 99.5 99.4 99.5 99.5 99.5 C-p 54.7 72.0 32.7 34.4 6.8 10.9 4.6 6.3 5.1 6.6 7.0 S 0.015088 0.016462 0.013115 0.013277 0.010145 0.010663 0.0097127 0.0099487 0.0096421 0.0098471 0.0097668 x xx3xxx 12*456 X X X X X X XX X X XXX X X X X X X X X X X X X X X X X X X X X X X X b) Forward Alpha-to-Enter: 0.25 on 6 predictors, with N = 2 6.837 0.00430 11.31 0.000 0.00003 3.65 0.001 3 6.698 0.00332 6.52 0.000 0.00006 4.66 0.000 4 6.728 0.00333 7.98 0.000 0.00003 2.52 0.016 40 5 -15.490 0.00280 5.63 0.000 0.00002 1.23 0.227 Response is Step Constant x4 T-Value P-Value x2 T-Value P-Value x6 T-Value P-Value x5 T-Value P-Value x3* T-Value P-Value S R-Sq R-Sq(adj) -0.00159 -0.00330 -0.00461 -2.66 -5.25 -4.86 0.011 0.000 0.000 0.00041 4.33 0.000 0.00026 2.11 0.042 2.2 1.81 0.080 0.0151 98.75 98.72 0.0131 99.08 99.03 0.0122 99.23 99.17 0.00995 99.50 99.44 0.00964 99.54 99.48 46 C-p 54.7 32.7 23.7 6.3 5.1 c) Backward elimination. Response is Step Constant x1 T-Value P-Value x2 T-Value P-Value x3* T-Value P-Value x4 T-Value P-Value x5 T-Value P-Value x6 T-Value P-Value S R-Sq R-Sq(adj) C-p y* 1 -16.26 -0.00004 -0.37 0.713 0.00002 1.25 0.220 2.3 1.82 0.078 0.00312 3.14 0.004 0.00027 2.11 0.042 Alpha-to-Remove: 0.1 on 6 predictors, with N = 3 -23.51 40 2 -15.49 0.00002 1.23 0.227 2.2 1.81 0.080 0.00280 5.63 0.000 0.00026 2.11 0.042 3.0 2.90 0.006 0.00296 6.12 0.000 0.00026 2.07 0.046 -0.00463 -0.00461 -0.00501 -4.81 -4.86 -5.56 0.000 0.000 0.000 0.00977 99.55 99.46 7.0 0.00964 99.54 99.48 5.1 0.00971 99.52 99.47 4.6 d) Model contains only x3*, x4, x5, and x6 seems to be the “best” among all, in the sense that it has high R-Sq(adj) and small Cp value. 47 ...
View Full Document

This note was uploaded on 03/15/2009 for the course IE 315 taught by Professor Kapur during the Spring '09 term at University of Washington.

Page1 / 47

ch06 - CHAPTER 6 Note to Instructor: variables. Sections...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online