lab5 - # Last time we did regression on hail data, but the...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
# Last time we did regression on hail data, but the only models we examined were # lm(size ~ diverg) r.squared = 0.2719047 # lm(size ~ rotate) r.squared = 0.2900612 # lm(size ~ diverg+rotate) r.squared = 0.3628943 # And, based on these, it appears that diverg and rotate, together, give # a better model for predicting hail size. # 1) Now that we know about nonlinear and interaction terms, let's try them: dat = read.table("http://www.stat.washington.edu/marzban/390/hail_dat.txt", header=T) plot(dat) size=dat[,3] rotate=dat[,2] diverg=dat[,1] # In the scatterplots, note the collinearity in the data. As such, the # regression coefficients are meaningless and unreliable. # But the regression models themselves are still OK. # multiple regression with interaction lm.d = lm(size ~ diverg + rotate + I(diverg*rotate)) summary(lm.d)$r.squared # 0.3745302 # multiple quadratic regression lm.e=lm(size~diverg + rotate + I(diverg^2) + I(rotate^2) ) lm.e # Note: there are now *4* coefficients. summary(lm.e)$r.squared # 0.3799713 # multiple quadratic regression with interaction. lm.f=lm(size~diverg+rotate + I(diverg^2) + I(rotate^2) + I(diverg*rotate) ) summary(lm.f)$r.squared # 0.3800302 # Here is a discussion of all of the above results: It *seems* like # - rotate is a better predictor of size than diverg. # - The two of them together make for an even better model. # - Quadratic terms for each, make the model even better, but not by much # (R^2 goes from 0.3628943 to 0.3799713). # - An interaction term, without quadratic terms, gives a model that is # comparable to what we got from a quadratic model with no interation. # - Quadratic and interactions, together, *seem* to give the best model. # But do NOT forget that R^2 never decreases anyway as you add more # terms to the regression model. The main question (which you can address # only qualitatively at this point) is if the gain in R^2 is big enough # to warrant the new term, and taking the chance of overfitting the data. # In this example, the gain from R^2=0.3799 to R^2-0.3800 is probably # NOT worth the risk. So, we should keep the simpler model. That's # called the principle of "Occam's Razor," which posits that one should # go with simpler things! ######################################################################## # 2) Chapter 4.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 4

lab5 - # Last time we did regression on hail data, but the...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online