STAT E-50 - Introduction to Statistics Inferences for Regression The fitted regression line has the equation ˆ 01 y=b +bx . Now we can find confidence intervals and perform hypothesis tests for the slope. The idealized regression line is y01 μ = β + β x+ ε where ε is the error y - μ y for each data point (x, y). Assumptions for the model and the errors: 1. Linearity Assumption Straight Enough Condition: does the scatterplot appear linear? Check the residuals to see if they appear to be randomly scattered 2. Independence Assumption: the errors must be mutually independent Randomization Condition: the individuals are a random sample Check the residuals for patterns, trends, clumping 3. Equal Variance Assumption: the variability of y should be about the same for all values of x Does The Plot Thicken? Condition: Is the spread about the line nearly constant in the scatterplot? Check the residuals for any patterns 4. Normal Population Assumption: the errors follow a Normal model at each value of x Nearly Normal Condition: Look at a histogram or NPP of the residuals If all assumptions are true, the idealized regression line will have a distribution of y-values for each x-value; these distributions will be normally distributed with equal variation and with means along the regression line: 1

Where do you start? 1. Create a scatterplot to see if the data is “straight enough”. y ˆ y ˆ 3. Draw a scatterplot of the residuals vs. x or ; this should have no pattern or bend, or thickening, or thinning, or outliers. 2. Fit a regression model and find the residuals (e) and predicted values ( ). 4. If the data is measured over time, plot the residuals vs. time to check for patterns that would suggest they are not independent. 5. If the scatterplot is “straight enough”, create a histogram and NPP of the residuals to check the “nearly normal” condition.
