chap4 - CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential observations are observations whose presence in the data can have a distorting ef- fect on the parameter estimates and possibly the entire analysis, e.g. identifying the wrong model. Distinction from outliers , though it is possible for one observation to be both influential and an outlier. Outliers: 1. data points that contain unusual dependent ( y ) values. 2. Outlying independent ( x ) values not in- dicating lack of fit of model, but some obser- vations still influence the fit more than others. Detection: In simple linear regression, usually easy from plots of data, but in multiple regres- sion, more formal measures are required. 2 o o o o o o o o oo o o x y 2 4 6 8 2 4 6 8 A o B o C Figure 4.1. Three least squares lines fitted to sample data, where the observation at x = 8 is allowed to move between the three points A, B and C. The corresponding least squares fits are the solid, dashed and dotted lines respectively. 3 The hat matrix Recall Y = HY , H = X ( X T X )- 1 X T , so co- variance matrix of Y is Var { Y } = H 2 Variance of y i is h ii 2 , variance of i th residual e i is (1- h ii ) 2 . Properties of the { h ii } values include h ii 1 for all i, (1) X i h ii = p. (2) Property (1) follows simply from the fact that both h ii 2 and (1- h ii ) 2 are the variances of random quantities, and therefore are nonneg- ative. For property (2), note that tr(H)=p. 4 Leverage A data point with large h ii is called a point of high leverage . How high is high? by (2), the average value of h ii is p n . A standard criterion is to call any data point for which h ii > 2 p n a point of high leverage. Note that since h ii is a function of X , it has no distribution, thus no formal test. 5 Example: Consider the artificial data of Fig. 4.1. The twelve x values here are , . 2 , . 4 ,..., 1 . 8 , 2 , 8 . The corresponding h ii values are . 1342 ,. 1221 ,...,. 0869 ,. 9182 . The last observation, corresponding to x = 8, is clearly highly influential. Intuitively, this is because if this point is moved up or down, the least squares straight line will tend to follow it the overall least squares fit on the other 11 observations is not much affected by modest changes in the slope of the fitted straight line, but the fit at x = 8 has a big influence. Note that this has nothing to do with y 12 pos- sibly being an outlier, since for any i , the actual value of y i does not even enter into the calcu- lation of h ii . 6 Real data examples from Chapter 3 Tree data: Highest h ii value is h 20 = 0 . 2428 (diameter=13.8, height=64), not extreme for either indepen- dent variable but does correspond to a fairly large diameter combined with the second small- est height....
View Full Document

This note was uploaded on 11/17/2011 for the course STOR 664 taught by Professor Staff during the Fall '11 term at UNC.

Page1 / 38

chap4 - CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online