This preview shows pages 1–13. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, DurbinWatson test when the data can be arranged in time order. Constant variance: scatter plot, residual plot (ABSresidual plot); BrownForsythe test, BreuschPagan Test Normality of error: Boxplot, histogram, normal probability plot; ShapiroWilks test, KolmogorovSmirnov, AndersonDarling Remark: Normality probability plot provides no information if the assumption of linearity and/or constant variance are violated 3 / 49 Influential point Combination of large absolute residual and high leverage ( h ii ) Leverage: diagonal value of Hat matrix ( H ) H = h 11 h 12 h 1 n h 21 h 22 h 1 n . . . . . . . . . . . . h n 1 h n 2 h nn High leverage large h ii 4 / 49 Residual Three types: Ordinary r : r i = y y , where E ( r i ) = 0 and var ( r i ) = ( 1 h ii ) 2 Standardized: r i 1 h ii Studendized (or Jackknife): r i ( i ) 1 h ii t n 2 where, 2 ( i ) = ( j r 2 j ( i ) ) / ( n p 1 ) and (p+1) is the number of parameter h ii is the leverage which is the diagonal value of Hat matrix. r j ( i ) = y j y j ( i ) = y j ( ( i ) + 1 ( i ) x j ) 5 / 49 Properties of residuals Sum to zero: r i = Are not independent 6 / 49 Residual Jackknife r i ( i ) = y i y i ( i ) N ( , 2 1 h ii ) where the subindex (i) indicate that estimate without point i. residual for y i computed using regression without y i then scaling. Studendized residual: r i ( i ) q var ( r i ( i ) ) r i ( i ) = y i y i ( i ) = y i [ ( i ) + 1 ( i ) x i ] 7 / 49 Studendized residual r i ( i ) q var ( r i ( i ) ) = r i ( i ) 1 h ii by Fact 1 and 2. Fact 1: r i ( i ) = r i 1 h ii Fact 2: r 2 i ( i ) = ( n p ) 2 r 2 i 1 h ii ( i ) = ( n p ) 2 r 2 i 1 h ii n p 1 8 / 49 Residual Using Fact1 r i ( i ) = r i 1 h ii , we have Var ( r i ( i ) ) = Var ( r i ) ( 1 h ii ) 2 = 2 1 h ii . But 2 is unknown. We use 2 ( i ) r i ( i ) = Y i Y i ( i ) r i ( i ) = r i 1 h ii 9 / 49 Residual Studendized residual r i 1 h ii r 2 ( i ) 1 h ii = r i q 2 ( i ) ( 1 h ii ) 2 ( i ) = j r 2 j ( i ) n p 1 where j r 2 j ( i ) = ( n p ) 2 r 2 i 1 h ii NOTE: large residual if  r j ( i )  > 3 An expression for the distribution of the standardized residuals was obtained (Weisberg, 1985). 10 / 49 Studendized residual r i ( i ) q var ( r i ( i ) ) = r i ( i ) 1 h ii t n p 1 You dont need to know how to prove this in our class! (beyond our class scope) 11 / 49 Comparison with standardized residual Standardized residual: r i p var ( r i ) = r i p 2 ( 1 h ii ) r i p ( 1 h ii ) 2 If one has outliers with large absolute residual, then 2 may not be a good measurement.be a good measurement....
View
Full
Document
This note was uploaded on 01/02/2012 for the course STAT 5044` taught by Professor Staff during the Fall '11 term at Virginia Tech.
 Fall '11
 Staff
 Variance

Click to edit the document details