{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

chap4

# chap4 - CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS...

This preview shows pages 1–8. Sign up to view the full content.

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Influential observations are observations whose presence in the data can have a distorting ef- fect on the parameter estimates and possibly the entire analysis, e.g. identifying the wrong model. Distinction from outliers , though it is possible for one observation to be both influential and an outlier. Outliers: 1. data points that contain unusual dependent ( y ) values. 2. Outlying independent ( x ) values — not in- dicating lack of fit of model, but some obser- vations still influence the fit more than others. Detection: In simple linear regression, usually easy from plots of data, but in multiple regres- sion, more formal measures are required. 2
o o o o o o o o oo o o x y 0 2 4 6 8 0 2 4 6 8 A o B o C Figure 4.1. Three least squares lines fitted to sample data, where the observation at x = 8 is allowed to move between the three points A, B and C. The corresponding least squares fits are the solid, dashed and dotted lines respectively. 3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
The hat matrix Recall ˆ Y = HY , H = X ( X T X ) - 1 X T , so co- variance matrix of ˆ Y is Var { ˆ Y } = 2 Variance of ˆ y i is h ii σ 2 , variance of i th residual e i is (1 - h ii ) σ 2 . Properties of the { h ii } values include 0 h ii 1 for all i, (1) X i h ii = p. (2) Property (1) follows simply from the fact that both h ii σ 2 and (1 - h ii ) σ 2 are the variances of random quantities, and therefore are nonneg- ative. For property (2), note that tr(H)=p. 4
Leverage A data point with large h ii is called a point of high leverage . How high is high? — by (2), the average value of h ii is p n . A standard criterion is to call any data point for which h ii > 2 p n a point of high leverage. Note that since h ii is a function of X , it has no distribution, thus no formal test. 5

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Example: Consider the artificial data of Fig. 4.1. The twelve x values here are 0 , 0 . 2 , 0 . 4 , ..., 1 . 8 , 2 , 8 . The corresponding h ii values are . 1342 , . 1221 , ..., . 0869 , . 9182 . The last observation, corresponding to x = 8, is clearly highly influential. Intuitively, this is because if this point is moved up or down, the least squares straight line will tend to follow it — the overall least squares fit on the other 11 observations is not much affected by modest changes in the slope of the fitted straight line, but the fit at x = 8 has a big influence. Note that this has nothing to do with y 12 pos- sibly being an outlier, since for any i , the actual value of y i does not even enter into the calcu- lation of h ii . 6
Real data examples from Chapter 3 Tree data: Highest h ii value is h 20 = 0 . 2428 (diameter=13.8, height=64), not extreme for either indepen- dent variable but does correspond to a fairly large diameter combined with the second small- est height. Next three largest values of h ii are h 3 = 0 . 1975, h 31 = 0 . 1803 and h 2 = 0 . 1672. In this case p = 3, n = 31 so according to the 2 p n = 0 . 1935 criterion, observations 3 and 20 are in- fluential. Draws attention to two observations which would not necessarily be identified as in- fluential from initial inspection of the data.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 38

chap4 - CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online