2.4
Cautions about Correlation and Regression
regression see scatter of data points about regression line
sum or squares of vertical distances from the points to the regression line are as small as
possible
distances represent “leftover” variation in the response after fitting the regression line
distances are known as residuals
Residuals
A residual
is the difference between an observed value of the response variable and the
value predicted by the regression line.
residual = observed y – predicted y
residual= y –
y
ˆ
Recall the fat gain versus NEA increase data
least squares regression line:
fat gain = 3.505 – (0.00344 * NEA increase)
one subject:
NEA increase= 135 calories
fat gain= 2.7 kg
predicted gain:
y
ˆ
= 3.505 – (0.00344 * NEA increase)= 3.04 kg
observed gain:
y = 2.7 kg
residual = observed y – predicted y
residual = 2.7 kg – 3.04 kg = 0.34 kg
residuals for 16 data points:
0.37
0.7
0.1
0.34
0.19
0.61
0.26
0.98
1.64
0.18
0.23
0.54
0.54
1.11
0.93
0.03
View Full DocumentTo assess the fit of a regression line you could:
•look at the vertical deviations of the data points from the regression line
•look at a residual plot (easier to study)
Residual Plots
A residual plot
is a scatterplot of the regression residuals against the explanatory variable.
●the mean of the residuals of a leastsquares regression is always zero
the line (residual = 0) in residual plot corresponds to the fitted regression line (Figure 2.20)
the residual plot magnifies the deviations from the line to make the patterns easier to see
•if regression line catches the overall pattern of the data, there should be no pattern in the
residuals (irregular scatter randomly distributed above and below zero)
residuals in Figure 2.20 have this irregular scatter
•don’t have an irregular horizontal pattern in residual plot this demonstrates regression
 Spring '08
 ABDUS,S.

