This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Biostatistics 100B Homework Solutions 6 February 20th, 2007 Solutions To Homework Assignment 6 Warmup Problems (1) Regression Assumptions Practice 1: (a) The first figure shows a residual plot, which is essentially a scatterplot turned on it’s side so you can see the errors relative to their mean of 0. We can use this to check whether the errors are mean 0, independent, and have constant variance. First, the mean 0 assumption is clearly violated. At the start the errors are all negative. Then for X between about 1.5 to 5 they are all positive. Then they are negative again and then become positive at the end. They are not centered about the 0 line for all X as they are supposed to be. The independence assumption is also violated by the pattern I just described. The errors go down and up and down and up again. It looks as if a curved model (in fact probably a cubic polynomial) would fit this data better. Remember that patterns in the data that show asymmetry about the 0 line indicate that independence is violated and that the shape of the model may be wrong. The constant variance assumption however looks OK. If we draw a band above and below the residuals following the up and down pattern, it stays a fairly similar width for each value of X. Therefore, I think this assumption is OK. Finally, there is one point which is out of line with the others. It is at about X=7.5 and has a positive residual when all the other points have negative residuals. This is certainly an outlier. Whether it is influential is hard to tell without actually seeing the regression and whether the line was pulled up much. However there are so many data points that the effect was probably fairly minimal. It is hard to check normality from this plot–we would need a histogram or normal quantile plot of the errors. (b) Here we are given a histogram and normal quantile plot of our residuals. These can be used to check normality but not any of the other assumptions. (While it is true that the histogram suggests the overall mean of the errors could be near 0, we need to check mean 0 for ALL X which the histogram can’t tell us since it doesn’t incorporate the X values!) Here the histogram is clearly skewed rather than bell-shaped and the points curve way away from a straight line on the normal quantile plot so the normality asusmption is clearly violated. (c) On the scatterplot and residual plot shown here, the mean 0 and independence assumptions are fine–the points are centered about the regression line or residual = 0 line for pretty much all the X values. However, the constant variance assumption is clearly violated. The points are widely spread out on the left, and very narrowly spread on the right. There are several influential/outlier points here. They are at coordinates (1,-65), (1,43), (1,57) and (15,-90) on the scatterplot. The last point is not influential because it follows the straight line. The others are probably pulling the line towards themselves and hence could be consideredstraight line....
View Full Document
- Fall '07
- Regression Analysis, SS df MS, Coef, box office sales, Adj R-squared Root