l24 - Outliers[NOTE These notes draw heavily from several...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Outliers—Page 1 Outliers [NOTE: These notes draw heavily from several sources, including Fox’s Regression Diagnostics; Pindyck and Rubinfeld; Statistics for Social Data Analysis, by George Bohrnstedt and David Knoke, 1982; Norusis’s SPSS 11 chapter 22 on “Analyzing residuals;” Hamilton’s chapter on “Robust regression.” I’m hitting highlights here, but the readings include lots of other good suggestions and details.] Description of the problem. One problem with least squares occurs when there are one or more large deviations, i.e. cases whose values differ substantially from the other observations. The slope and intercept of the least squares line is very sensitive to data points which lie far from the true regression line. These points are called outliers , i.e. extreme values of observed variables that can distort estimates of regression coefficients. Detecting the problem Scatterplots, frequencies can reveal atypical cases Can also look for cases with very large residuals. Suspicious correlations sometimes indicate the presence of outliers. SPSS has some good routines for detecting outliers. There is always the FREQUENCIES routine, of course. The GRAPH command can do scatterplots of 2 variables. The EXAMINE procedure includes an option for printing out the cases with the 5 lowest and 5 highest values. The REGRESSION command can print out scatterplots (particularly good is *ZRESID by *ZPRED, which is a plot of the standardized residuals by the standardized predicted values). In addition, the regression procedure will produce output on CASEWISE DIAGNOSTICS, which indicate which cases are extreme outliers and/or which cases have the most impact on the regression estimates. This is particularly useful in that you see which cases stand out even after all IVs have been controlled for. Stata counterparts to the above include The tab1 and table commands The scatter command (also graph7 will work, and seems to be quicker albeit old- fashioned; Stata redid its graphics in Stata 8 but graph7 will let you use the old graphics) The extremes command. This is an add-on module written by Nick Cox There are several plotting routines, including rvfplot (residuals versus fitted) The predict command has several options that can help you identify outliers
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Outliers—Page 2 Stata also has lots of other routines, many of them graphics-oriented, for detecting outliers. I won’t go through many of them, but I’ll include links on the course web page that give examples Probably the most critical difference between SPSS and Stata is that Stata includes additional routines (e.g., rreg , qreg ) for addressing the problem of outliers, which we will discuss below.
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/29/2012 for the course SOC 63993 taught by Professor Richardwilliams during the Spring '11 term at Notre Dame.

Page1 / 20

l24 - Outliers[NOTE These notes draw heavily from several...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online