# Try running these regression ideas on the two continuous variables
# you collected in an earlier hw. Do this on your own time - not in the lab,
# but if you have questions/trouble, ask me or your TA.
###########################################################
# In dealing with two continuous variables, we've learned that the first thing
# to do is to look at their scatterplot. The correlation coefficient
# summarizes the strength of the relationship between them, but it does not
# allow us to predict y from x. What does that is "regression" or the
# line that fits "through" the scatterplot.
# 1) Regression:
# The function lm() in R does regression, i.e., it fits a line thru a
# scatterplot, or a surface thru higher-dimensional data (later!).
# Let us just do simple linear regression, i.e. a fit of just one pair of
# variables, x and y.
# Consider a fake/simulated example:
rm(list=ls(all=TRUE))
set.seed(123)
x = runif(100,0,1)
# x is uniform between 0 and 1.
e = rnorm(100,0,1)
# error is normal with mean=0, sigma=1.
y = 10 + 2*x + e
# The real/true line is y = 10 + 2x.
plot(x,y)
# Here is the scatterplot,
cor(x,y)
# and the correlation between x and y.
lm.1 = lm(y ~ x)
# lm stands for linear model.
lm.1
# Note that the estimated coefficients are pretty close
# to the true ones (i.e., intercept=10, slope=2)
abline(lm.1)
# This draws the fit on the scatterplot.
# If you want to know what else is contained in lm.1, do this:
names(lm.1)
#################################################
# 2) Now, the example data from lecture: Compare answers there with those below.
x = c(72,70,65,68,70)
y = c(200,180,120,118,190)
plot(x,y)
cor(x,y)
lm.1 = lm(y~x)
abline(lm.1)
# Draws the fit
lm.1
# Gives you intercept and slope.
summary(lm.1)
# Gives that, and R-sqd, and more (for later).
# The following does anova, i.e. decomposing SST into explained & unexplained.
# Make sure you can identify the two pieces.
anova(lm.1)
# SS_explained=4942.3, SSE = 1308.9, (R^2 = 0.7906) .
# By the way, the residuals (i.e. errors) and the fitted (predicted) values