# Last time we did regression on hail data, but the only models we examined were
# lm(size ~ diverg)
r.squared = 0.2719047
# lm(size ~ rotate)
r.squared = 0.2900612
# lm(size ~ diverg+rotate) r.squared = 0.3628943
# And, based on these, it appears that diverg and rotate, together, give
# a better model for predicting hail size.
# 1) Now that we know about nonlinear and interaction terms, let's try them:
dat = read.table("http://www.stat.washington.edu/marzban/390/hail_dat.txt",
header=T)
plot(dat)
size=dat[,3]
rotate=dat[,2]
diverg=dat[,1]
# In the scatterplots, note the collinearity in the data. As such, the
# regression coefficients are meaningless and unreliable.
# But the regression models themselves are still OK.
# multiple regression with interaction
lm.d = lm(size ~ diverg + rotate + I(diverg*rotate))
summary(lm.d)$r.squared
# 0.3745302
# multiple quadratic regression
lm.e=lm(size~diverg + rotate + I(diverg^2) + I(rotate^2) )
lm.e
# Note: there are now *4* coefficients.
summary(lm.e)$r.squared
# 0.3799713
# multiple quadratic regression with interaction.
lm.f=lm(size~diverg+rotate + I(diverg^2) + I(rotate^2) + I(diverg*rotate) )
summary(lm.f)$r.squared
# 0.3800302
# Here is a discussion of all of the above results: It *seems* like
#  rotate is a better predictor of size than diverg.
#  The two of them together make for an even better model.
#  Quadratic terms for each, make the model even better, but not by much
#
(R^2 goes from 0.3628943 to 0.3799713).
#  An interaction term, without quadratic terms, gives a model that is
#
comparable to what we got from a quadratic model with no interation.
#  Quadratic and interactions, together, *seem* to give the best model.
# But do NOT forget that R^2 never decreases anyway as you add more
# terms to the regression model. The main question (which you can address
# only qualitatively at this point) is if the gain in R^2 is big enough
# to warrant the new term, and taking the chance of overfitting the data.
# In this example, the gain from R^2=0.3799 to R^20.3800 is probably
# NOT worth the risk. So, we should keep the simpler model. That's
# called the principle of "Occam's Razor," which posits that one should
# go with simpler things!
########################################################################
# 2) Chapter 4.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 Standard Deviation, Mean, std dev, Neyman

Click to edit the document details