Outliers—Page 1
Outliers
[NOTE: These notes draw heavily from several sources, including Fox’s Regression Diagnostics;
Pindyck and Rubinfeld; Statistics for Social Data Analysis, by George Bohrnstedt and David
Knoke, 1982; Norusis’s SPSS 11 chapter 22 on “Analyzing residuals;” Hamilton’s chapter on
“Robust regression.” I’m hitting highlights here, but the readings include lots of other good
suggestions and details.]
Description of the problem.
One problem with least squares occurs when there are one or
more large deviations, i.e. cases whose values differ substantially from the other observations.
The slope and intercept of the least squares line is very sensitive to data points which lie far from
the true regression line. These points are called
outliers
, i.e. extreme values of observed variables
that can distort estimates of regression coefficients.
Detecting the problem
Scatterplots, frequencies can reveal atypical cases
Can also look for cases with very large residuals.
Suspicious correlations sometimes indicate the presence of outliers.
SPSS has some good routines for detecting outliers.
There is always the FREQUENCIES routine, of course.
The GRAPH command can do scatterplots of 2 variables.
The EXAMINE procedure includes an option for printing out the cases with the 5
lowest and 5 highest values.
The REGRESSION command can print out scatterplots (particularly good is
*ZRESID by *ZPRED, which is a plot of the standardized residuals by the
standardized predicted values). In addition, the regression procedure will produce
output on CASEWISE DIAGNOSTICS, which indicate which cases are extreme
outliers and/or which cases have the most impact on the regression estimates. This is
particularly useful in that you see which cases stand out even after all IVs have been
controlled for.
Stata counterparts to the above include
The
tab1
and
table
commands
The
scatter
command (also
graph7
will work, and seems to be quicker albeit old
fashioned; Stata redid its graphics in Stata 8 but
graph7
will let you use the old
graphics)
The
extremes
command. This is an addon module written by Nick Cox
There are several plotting routines, including
rvfplot
(residuals versus fitted)
The
predict
command has several options that can help you identify outliers
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentOutliers—Page 2
Stata also has lots of other routines, many of them graphicsoriented, for detecting
outliers. I won’t go through many of them, but I’ll include links on the course web page
that give examples
Probably the most critical difference between SPSS and Stata is that Stata includes
additional routines (e.g.,
rreg
,
qreg
) for addressing the problem of outliers, which we
will discuss below.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '11
 RichardWilliams
 Regression Analysis, Spss

Click to edit the document details