1)a) Pagecost and circulation seem to be pretty right skewed. Percentage of male and median income seem to be bimodal and do not really have any pattern of skewness. b) y=bo + b1x1 + b2x2 + b3x3 + Epsilon

B) Page cost and circulation seem to have a moderately strong curve relationship close to natural log relationship. There seems to be no clear relationship between Page cost and the other two explanatory variables. 2) Residual diagnostic because Epsilon is estimated by residuals. a) Since p value for the F-test is 0.000, so YES, the model is useful in predicting the response. b) y= 7008 + 3.96x1 – 34.7x2 + .7x3 y= pagecost x1= circulation x2= percentage of males x3= median income c) Circulation is very strongly significant (t-test , p-value is .000), percentage of males is not significant (t-test, p value is .815). Median income is also not significant ( t-test, p value is .
109) Since circulation has p-value less than .05, I would keep it in the model. Percentage of males, however, should first be taken out since it is the most insignificant variable due to its highest p-value of .815. If necessary, I would remove median income as well. d)scatterplot of residuals versus predicted values there is a certain pattern present in the residual plot, hence the linearity assumption is not satisfied. Perhaps the relationship is not linear. Heteroscedasticity does not seem as evident here. A transformation applied to the variables may fix these problems. For example, I would probably try to transform the page cost and or circulation variables by taking the natural log.
