Lecture 8-9_ModelDiagnosis

# Lecture 8-9_ModelDiagnosis - Lectures 8-9: Model...

This preview shows pages 1–7. Sign up to view the full content.

1 Lectures 8-9: Model Assumptions: Assessment and Repair This material is mostly in Chapters 5 & 6 of Dielman - Some of these chapters involve “multiple regression” ideas, but most of it only uses simple regression. Class Notes may be the most useful source material for now As we progress through Dielman pay special attention to - Section 6.7.2+; Outliers - Section 6.4, & 5.2.2-5.2.4; Linearity and curvilinear relations - Chapter 5.2.1, polynomial regression, is also relevant but will be treated later - Section 6.5; Constant variance assumption (Homoscedasticity) - Section 6.6; Normality - Section 6.7.1; Influential points

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 Ordinary Linear Model Assumptions (review) Properties of errors under ideal model: 01 Yx x µ ββ = + for all x . i ii Y xe ± ++ for all x i All e i have the same variance, 2 e σ [“ Homoscedasticity ”] The distribution of ex is normal. [Also ( ) 0 Eex = ] 1 ,.., n ee are independent. Equivalent description: For each x i the corresponding Y i has a normal distribution with mean i x + , a linear function of x , and constant variance. Also Y 1 ,.., Y n are independent.
3 Ideal Regression This data conforms to the ideal pattern described on p.2: 0 1 2 3 4 5 6 7 8 9 10 y 0 1 2 3 4 5 6 7 8 9 10 11 x Scatterplot from Ideal_Regression.jmp Notes: The data follows a generally linear pattern - the vertical deviations are well-behaved This regression has a positive slope – ideal regressions can also have negative slope. The x values need not be equally spread – these aren’t. But there shouldn’t be outliers in the x-direction. (Such outliers would be “high leverage” points – see later.)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
4 Things to look for in the data ( and what to do about them ) These things suggest deviations from the assumptions Order of Inspection 1. Outliers (Covered in Lecture 8) 2. Heteroscedasticity Changes in the vertical spread of the data for varying x Possible “cure” = transformation of Y 3. Non-linearity (curvature of the general pattern) Possible “cure” = transformation of X 4. Non-normality of the residuals Possible “cure” = Caution + Transformation of Y 5. Lack of Independence - This is common in time-series data, which will be studied later 6. Influential Points: Technically not a “deviation from assumptions”, but they do require caution in interpreting the analysis; Sometimes transformation of X is also useful
5 Example: House Prices in Zip 30062 ( cont ) New perspective : This time we’ll look at House Prices (for sales in 2003) as a function of the age of the house. 10 20 30 40 50 60 Price (\$1,000) 0 10 20 30 40 50 60 bldgAGE

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
6 1. Regression Outliers These are points far from the overall pattern for the average of Y given x . Potential regression outliers should be examined to see whether they belong in the data. - Look for: Mis-recorded data, and for Data that otherwise doesn’t belong If they don’t belong then they should be Excluded If they do belong (or if their status is unclear), they can be retained in the analysis, BUT One should later check that any conclusions drawn from the data are not strongly dependent on just a few potential outliers This is what we’ll do with our two questionable data points
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 04/04/2012 for the course STAT 102 taught by Professor Shaman during the Spring '08 term at UPenn.

### Page1 / 30

Lecture 8-9_ModelDiagnosis - Lectures 8-9: Model...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online