# handout_2_4 - 2.4 Cautions about Correlation and Regression...

2.4 Cautions about Correlation and Regression regression- after fit a line to data, can see scatter of data points about regression line regression makes sum of squares of vertical distances from the data points and the regression line are as small as possible distances represent “left-over” variation in the response after fitting the regression line- distances are known as residuals Residuals A residual is the difference between an observed value of the response variable and the value predicted by the regression line (when looking at sample data). residual = observed y – predicted y residual= y y ˆ Recall the fat gain versus NEA increase data least-squares regression line: fat gain = 3.505 – (0.00344 × NEA increase) one subject: NEA increase ( x ) = 135 calories fat gain ( y ) = 2.7 kg predicted gain: y ˆ = 3.505 – (0.00344 × NEA increase)= 3.04 kg observed gain: y = 2.7 kg residual = observed y – predicted y residual = 2.7 kg – 3.04 kg = -0.34 kg residuals for 16 data points: 0.37 -0.7 0.1 -0.34 0.19 0.61 -0.26 -0.98 1.64 -0.18 -0.23 0.54 -0.54 -1.11 0.93 -0.03

To assess the fit of a regression line you could: •look at the vertical deviations of the data points from the regression line •look at a residual plot (easier to study) 2
Residual Plots A residual plot is a scatterplot of the regression residuals against the explanatory variable.

