# E check for inﬂuential points check the influential

• 17
• 88% (16) 14 out of 16 people found this document helpful

This preview shows page 9 - 14 out of 17 pages.

(e) Check for inﬂuential points.Check the influential points by the half-normal plot on the Cook’s distances.
The point (32) is the most influential point. (f) Check the structure of the relationship between the predictors and the response. 5. Using lmod1:(a) Compute and comment on the correlations between the predictors.round(cor(prostate[, -9]), 2) 0.43 0.05 0.28 0.08 0.46 0.63 0.75 1.00 (b) Compute the variance inﬂation factors. Should we be worried about multicollinearity? Why/Why not? svi
6. Run stepwise regression on lmod1. Use direction = “both” as one of the parameters of the function. Let us call thislmod1_step. What is the AIC value of lmod1_step? Do you see any reduction in the number of predictors being used inlmod1_step, as compared to lmod1? If yes, what predictors are absent now? What is the Rof the model? Is it better than that oflmod1? Are there still any predictors that are not significant at 95% level? Compare both the models and comment on which ofthem should be preferred and why?
## Start: AIC=-58.32 ## lpsa ~ lcavol + lweight + age + lbph + svi + lcp + gleason + ## pgg45 ## ## Df Sum of Sq RSS AIC ## - gleason 1 0.0412 44.204 -60.231 ## - pgg45 1 0.5258 44.689 -59.174 ## - lcp 1 0.6740 44.837 -58.853 ## <none> 44.163 -58.322 ## - age 1 1.5503 45.713 -56.975 ## - lbph 1 1.6835 45.847 -56.693 ## - lweight 1 3.5861 47.749 -52.749 ## - svi 1 4.9355 49.099 -50.046 ## - lcavol 1 22.3721 66.535 -20.567 ## ## Step: AIC=-60.23 ## lpsa ~ lcavol + lweight + age + lbph + svi + lcp + pgg45 ## ## Df Sum of Sq RSS AIC ## - lcp 1 0.6623 44.867 -60.789 ## <none> 44.204 -60.231 ## - pgg45 1 1.1920 45.396 -59.650 ## - age 1 1.5166 45.721 -58.959 ## - lbph 1 1.7053 45.910 -58.560 ## + gleason 1 0.0412 44.163 -58.322 ## - lweight 1 3.5462 47.750 -54.746 ## - svi 1 4.8984 49.103 -52.037 ## - lcavol 1 23.5039 67.708 -20.872 ## ## Step: AIC=-60.79 ## lpsa ~ lcavol + lweight + age + lbph + svi + pgg45 ## ## Df Sum of Sq RSS AIC ## - pgg45 1 0.6590 45.526 -61.374 ## <none> 44.867 -60.789 ## + lcp 1 0.6623 44.204 -60.231 ## - age 1 1.2649 46.131 -60.092 ## - lbph 1 1.6465 46.513 -59.293 ## + gleason 1 0.0296 44.837 -58.853 ## - lweight 1 3.5647 48.431 -55.373 ## - svi 1 4.2503 49.117 -54.009 ## - lcavol 1 25.4189 70.285 -19.248 ## ## Step: AIC=-61.37 ## lpsa ~ lcavol + lweight + age + lbph + svi ## ## Df Sum of Sq RSS AIC ## <none> 45.526 -61.374 ## - age 1 0.9592 46.485 -61.352 ## + pgg45 1 0.6590 44.867 -60.789 ## + gleason 1 0.4560 45.070 -60.351 ## + lcp 1 0.1293 45.396 -59.650 ## - lbph 1 1.8568 47.382 -59.497 ## - lweight 1 3.2251 48.751 -56.735 ## - svi 1 5.9517 51.477 -51.456 ## - lcavol 1 28.7665 74.292 -15.871 summary(lmod1_step)
## ## Call: ## lm(formula = lpsa ~ lcavol + lweight + age + lbph + svi, data = prostate) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.83505 -0.39396 0.00414 0.46336 1.57888 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.95100 0.83175 1.143 0.255882 ## lcavol 0.56561 0.07459 7.583 2.77e-11 *** ## lweight 0.42369 0.16687 2.539 0.012814 * ## age -0.01489 0.01075 -1.385 0.169528 ## lbph 0.11184 0.05805 1.927 0.057160 .