Leverage and Influence

# Leverage and Influence - LeverageandInfluence In Linear...

This preview shows pages 1–3. Sign up to view the full content.

Leverage and Influence In Linear Regression ( Simple or Multiple ), any prediction y i * can be represented as a linear combination of the observations y j : y i * = Σ j h ij y j The coefficient of observation y i , that is h ii , is called the leverage of that observation. The larger h ii , the larger the contribution of y i to y i * . It can be shown that : The value of any leverage is between 0 and 1 : 0 h ii 1 The sum of the leverages is p , the number of parameters of the regression model (that is, in general, the number of predictors + 1). Leverages depend only on the { x j }, and are fixed, not random quantities. It can be shown that an observation with a large leverage ("large" meaning substantially larger than the average value p/n, where n is the number of observations) lies at the periphery of the predictors' domain. An observation with a large leverage is called a "leverage observation". By itself, the leverage is of limited usefulness because it does not take into account the residual of the observation. But the classical measures of the influence of an observation on the predictions of the model ( e.g. DFFITS , Cook 's distance) combine both leverages and residuals.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Regression Diagnostics  by Gerard E. Dallal, Ph.D.  In the 1970s and 80s, many statisticians developed techniques for assessing  multiple regression models. One of the most influential books on the topic was  Regression Diagnostics: Identifyin Influential Data and Sources of Collinearity  by Belsley, Kuh, and Welch. Roy Welch tells of getting interested in regression  diagnostics when he was once asked to fit models to some banking data. When  he presented his results to his clients, they remarked that the model could not  be right because the sign of one of the predictors was different from what they  expected. When Welch looked closely at the data, he discovered the sign  reversal was due to an outlier in the data. This example motivated him to  develop methods to insure it didn't happen again!  Perhaps the best reason for studying regression diagnostics was given by Frank  Anscombe when he was discussing outliers.  We are usually happier about asserting a regression relation if the relation is  appropriate after a few observations (any ones) have been deleted--that is, we  are happier if the regression relation seems to permeate all the observations and  does not derive largely from one or two. Regression diagnostics were developed to measure various ways in which a  regression relation might derive largely from one or two observations.  Observations whose inclusion or exclusion result in substnatial changes in the  fitted model (coefficients, fitted values) are said to be  influential . Many of  these diagnostics are available from standard statistical program packages.
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 02/15/2012 for the course GEO 4167 taught by Professor Staff during the Spring '12 term at University of Florida.

### Page1 / 8

Leverage and Influence - LeverageandInfluence In Linear...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online