# Try running these regression ideas on the two continuous variables
# you collected in an earlier hw. Do this on your own time  not in the lab,
# but if you have questions/trouble, ask me or your TA.
###########################################################
# In dealing with two continuous variables, we've learned that the first thing
# to do is to look at their scatterplot. The correlation coefficient
# summarizes the strength of the relationship between them, but it does not
# allow us to predict y from x. What does that is "regression" or the
# line that fits "through" the scatterplot.
# 1) Regression:
# The function lm() in R does regression, i.e., it fits a line thru a
# scatterplot, or a surface thru higherdimensional data (later!).
# Let us just do simple linear regression, i.e. a fit of just one pair of
# variables, x and y.
# Consider a fake/simulated example:
rm(list=ls(all=TRUE))
set.seed(123)
x = runif(100,0,1)
# x is uniform between 0 and 1.
e = rnorm(100,0,1)
# error is normal with mean=0, sigma=1.
y = 10 + 2*x + e
# The real/true line is y = 10 + 2x.
plot(x,y)
# Here is the scatterplot,
cor(x,y)
# and the correlation between x and y.
lm.1 = lm(y ~ x)
# lm stands for linear model.
lm.1
# Note that the estimated coefficients are pretty close
# to the true ones (i.e., intercept=10, slope=2)
abline(lm.1)
# This draws the fit on the scatterplot.
# If you want to know what else is contained in lm.1, do this:
names(lm.1)
#################################################
# 2) Now, the example data from lecture: Compare answers there with those below.
x = c(72,70,65,68,70)
y = c(200,180,120,118,190)
plot(x,y)
cor(x,y)
lm.1 = lm(y~x)
abline(lm.1)
# Draws the fit
lm.1
# Gives you intercept and slope.
summary(lm.1)
# Gives that, and Rsqd, and more (for later).
# The following does anova, i.e. decomposing SST into explained & unexplained.
# Make sure you can identify the two pieces.
anova(lm.1)
# SS_explained=4942.3, SSE = 1308.9, (R^2 = 0.7906) .
# By the way, the residuals (i.e. errors) and the fitted (predicted) values
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document# are also contained in lm(). For example, here is the residual plot:
plot(lm.1$fitted,lm.1$residuals)
# A random scatter of points around
abline(h=0)
# the horiz line is a GOOD thing.
#############################################
# 3) Now, do regression on hail data:
# In reality, divergence is measured by Doppler radar, and so
# if we can predict hail size from divergence, then we can predict hail size
# from Doppler radar. That's useful!
#
Do simple linear regression for predicting size from divergence.
#
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 Regression Analysis, lm.1

Click to edit the document details