STAT E50  Introduction to Statistics
Inferences for Regression
The fitted regression line has the equation
ˆ
01
y=b +bx
.
Now we can find
confidence intervals and perform hypothesis tests for the slope.
The idealized regression line is
y01
μ
=
β
+
β
x+
ε
where
ε
is the error
y 
μ
y
for
each data point (x, y).
Assumptions for the model and the errors:
1. Linearity Assumption
Straight Enough Condition:
does the scatterplot appear linear?
Check the residuals to see if they appear to be randomly scattered
2. Independence Assumption:
the errors must be mutually independent
Randomization Condition:
the individuals are a random sample
Check the residuals for patterns, trends, clumping
3. Equal Variance Assumption:
the variability of y should be about the
same for all values of x
Does The Plot Thicken? Condition:
Is the spread about the line nearly constant in the scatterplot?
Check the residuals for any patterns
4. Normal Population Assumption:
the errors follow a Normal model at
each value of x
Nearly Normal Condition:
Look at a histogram or NPP of the residuals
If all assumptions are true, the idealized regression line will have a distribution of
yvalues for each xvalue; these distributions will be normally distributed with
equal variation and with means along the regression line:
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentWhere do you start?
1. Create a scatterplot to see if the data is “straight enough”.
y
ˆ
y
ˆ
3. Draw a scatterplot of the residuals vs. x or
; this should have no pattern
or bend, or thickening, or thinning, or outliers.
2. Fit a regression model and find the residuals (e) and predicted values (
).
4. If the data is measured over time, plot the residuals vs. time to check for
patterns that would suggest
they are not independent.
5. If the scatterplot is “straight enough”, create a histogram and NPP of the
residuals to check the “nearly normal” condition.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 WEINSTEIN
 Statistics, Normal Distribution, Regression Analysis, Prediction interval, ski time

Click to edit the document details