Lec10

# Lec10 - The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL...

© 2008 Haipeng Shen 2/19/08 Lecture 10 1 STOR 155 Introductory Statistics Lecture 10: Cautions about Regression and Correlation, Causation The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

© 2008 Haipeng Shen 2/19/08 Lecture 10 2 Review Least-Squares Regression Lines Equation and interpretation of the line Prediction using the line Correlation and Regression Coefficient of Determination
© 2008 Haipeng Shen 2/19/08 Lecture 10 3 Regression Diagnostics Look at residuals (errors): A residual is the difference between an observed value of the response variable and the value predicted by the regression line, i.e., The sum of the least-squares residuals is always zero. . ˆ residual y y - = Why?

© 2008 Haipeng Shen 2/19/08 Lecture 10 4 Residual Plots A residual plot is a scatterplot of the regression residuals against the explanatory variable. Residual plots help us assess the fit of a regression line.
© 2008 Haipeng Shen 2/19/08 Lecture 10 5 Age vs. Height

© 2008 Haipeng Shen 2/19/08 Lecture 10 6 Residual Plot If the regression line catches the overall pattern of the data, there should be no pattern in the residual. totally random
© 2008 Haipeng Shen 2/19/08 Lecture 10 7 nonlinear nonconstant variation

© 2008 Haipeng Shen 2/19/08 Lecture 10 8 Diabetes Patient: FPG vs. HbA FPG: fasting plasma glucose. HbA: percent of red blood cells that have a glucose molecule attached. Both are measuring blood glucose. We expect a positive association. 18 subjects, r=0.4819. See the scatterplot on the next page.
© 2008 Haipeng Shen 2/19/08 Lecture 10 9 Diabetes Patient: FPG vs. HbA

© 2008 Haipeng Shen 2/19/08 Lecture 10 10 Outliers and Influential Observations An outlier is a point that lies outside the overall pattern of the other points.
