Lecture 22 slides(Violation Solution-WLS &amp; Transformation)

# Lecture 22 slides(Violation Solution-WLS & Transformation)

This preview shows pages 1–3. Sign up to view the full content.

Three measures of influence All three measures of influence (DFFITS, DFBETAS, Cook’s distance) are measuring different aspects of the influence that observations have on the model DFFITS measures the influence on the fitted value for the i th observation, DFBETAS measures the influence on each coefficient separately, Cook’s distance measures how far the observation moves the vector of regression coefficients Lecture 20 – p. 1/35 Guidelines for handling problem points If points are influential or points are outlying or are badly fit, left with tough decisions regarding whether one should include the point or not Decisions should, optimally, be made for scientific reasons (mis-measurement, breakdown of assumptions of the model, etc.) Using methods of Chapters 5 and 6 one can identify potential problem points Best solution: fit multiple models to see if they coarsely agree on the shape of the model If they drastically differ, should be concerned that the violations of assumptions are severe Regresion should be robust to your own stupidity Lecture 20 – p. 2 / Chapter 7 Chapter 5 and 6 focused on diagnosing problems with the model Chapter 7 proposes solutions to some types of problems we saw in those chapters (and then points out a few others with solutions) Important again to remember not to push these methods past their logical conclusion Should view the solutions of Chapter 7 as tools to be used when specific problems arise rather than as a way to try and fix all problems Lecture 20 – p. 3/35 Chapter 7 Overview Chapters 5 and 6: diagnostics Chapter 7: Solutions to problems seen in diagnostics Must balance solutions with time and interpretability Lecture 20 – p. 4 /

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Heterogeneous Variance Common problem often diagnosed through residual vs. fitted or residual vs. covariate plots Can also arise by design if there are multiple observations at a particular covariate and response level Lecture 20 – p. 5/35 Weighted Least Squares Assume the response are fit by the general linear model: y = X β + ǫ The usual estimator for β is ˆ β = ( X t X ) - 1 X t y Typically assume that V ar ( ǫ ) = σ 2 I n × n Lecture 20 – p. 6 / Weighted Least Squares Can relax the assumption so that we allow for any variance-covariance matrix for the model residuals, i.e. V ar ( ǫ ) = V , where V is a positive definite matrix Simple generalization would be to assume that V is a diagonial matrix such that the diagonal elements are v ii = σ 2 i Still allow for uncorrelated errors, but different (and for now known) variances for each response How does this change the problem? Lecture 20 – p. 7/35 Weighted Least Squares Easiest to work from the maximum likelihood perspective Can write the likelihood for this particular problem as: p ( y | β, V ) = 1 2 π | V | 1 / 2 exp p 1 2 ( y X β ) t V - 1 ( y X β ) P Because V is known, we simply need to minimize ( y X β ) t V - 1 ( y X β ) over β Lecture 20 – p. 8 /
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 01/15/2010 for the course MATH 423 taught by Professor Steele during the Spring '06 term at McGill.

### Page1 / 9

Lecture 22 slides(Violation Solution-WLS & Transformation)

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online