# Chapter 8_F18.docx - Chapter 8 A closer look at assumptions...

• 13

This preview shows page 1 - 5 out of 13 pages.

Chapter 8 A closer look at assumptions for Simple Linear Regression MTH567, Fall 2018 Ariadni Papana Goals Assumptions of SLR & Transformations Robustness of tools to model violations Graphical tools and lack-of-fit test Case Studies Island Area and Number of Species Breakdown Times for Insulating Fluid Under Different Voltages Form: Means of the subpopulations fall on a straight line function of the explanatory variable Normality: normal subpopulation of responses for each value of the explanatory variable. Equal SD: The subpopulation standard deviations are all equal σ { Y | X } = σ Independence: Selection of an observation from any of the subpopulations is independent of the selection of any other observation.
Case 0801. Island Area and Number of Species Biologists have noticed a consistent relation between the area of islands and the number of animal and plant species living on them. If S is the number of species and A is the area, then S=CA (roughly), where C : constant and is a biologically meaningful parameter that depends on the group of organisms (birds, reptiles, or grasses). The data are the numbers of reptile and amphibian species and the island areas for seven islands in the West Indies. What is the relationship between area of islands and number of animal/plant species living on them? Units: Type of study: Response: Explanatory: install.packages("mosaic") require(mosaic) install.packages("Sleuth3") require(Sleuth3) trellis.par.set(theme = col.mosaic()) options(digits = 4) #summary stats case0801 summary(case0801) Area Species Min. : 1 Min. : 7.0 1st Qu.: 18 1st Qu.: 13.5 Median : 3435 Median : 45.0 Mean :11615 Mean : 48.6 3rd Qu.:16808 3rd Qu.: 76.5 Max. :44218 Max. :108.0 xyplot(Species ~ Area, data = case0801) densityplot(~Area, data = case0801) densityplot(~Species, data = case0801) It appears that the relationship with the observed values ……………………………... Let’s check the normality assumption ……. Is a transformation necessary?
TRICK: log transformed both the variables #log-transform case0801\$logarea = with(case0801, log(Area)) case0801\$logspecies = with(case0801, log(Species)) #log-transform X xyplot(Species ~ logarea, type = c("p", "r"), data = case0801) #log-transform both xyplot(logspecies ~ logarea, type = c("p", "r"), data = case0801) Model:………………………………………………………… #SLR---------------------------------------------- lm1 = lm(logspecies ~ logarea, data = case0801) summary(lm1) Call: lm(formula = logspecies ~ logarea, data = case0801) Residuals: 1 2 3 4 5 6 7 -0.002136 0.176975 -0.215487 0.000947 -0.029244 0.059543 0.009402 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.9365 0.0881 22.0 3.6e-06 *** logarea 0.2497 0.0121 20.6 5.0e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.128 on 5 degrees of freedom Multiple R-squared: 0.988, Adjusted R-squared: 0.986
F-statistic: 425 on 1 and 5 DF, p-value: 4.96e-06 confint(lm1) 2.5 % 97.5 % (Intercept) 1.7100 2.1631 logarea 0.2186 0.2808 2^confint(lm1) 2.5 % 97.5 % (Intercept) 3.272 4.479 logarea 1.164 1.215 Estimated model:……………………………………………………..