Chapter 3 - STAT 410/510 F-2011 Chap. 3 Diagnostics and...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
STAT 410/510 F-2011 1 Chap. 3 Diagnostics and Remedial Measures 3.1 Diagnostics for predictor variable data Toluca1; set Toluca; run+ 1 ; proc chart data =Toluca1; vbar lotsize/ levels = 11 ; * histogram or dot plot; proc gplot ; symbol1 value =dot interpol =join; plot lotsize*run; * sequence (run) plot; proc univariate plot normal ; * stem-and-leaf plot and box plot; var lotsize; run ; Figure 1. Dot plot and Sequence plot Stem plot and Box plot: Stem Leaf # Boxplot 12 0 1 | 11 00 2 | 10 00 2 | 9 0000 4 +-----+ 8 000 3 | | 7 000 3 *--+--* 6 0 1 | | 5 000 3 +-----+ 4 00 2 | 3 000 3 | 2 0 1 | ----+----+----+----+ Multiply Stem.Leaf by 10**+1 l o t s i z e 20 30 40 50 60 70 80 90 100 110 120 r u n 0 10 20 30
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
STAT 410/510 F-2011 2 3.2 Residuals The residuals are defined as i i i Y Y e ˆ ( observed error ) If the model is appropriate for the given data, the observed residuals should reflect the properties assumed for the true error } { i i i Y E Y . Properties of residuals: The mean of n residuals i e for the simple linear regression model (2.1) is 0 n e e i The variance of n residuals i e is defined as MSE n SSE n e n e e e e n e e s i e i i i 2 2 2 ) 2 ( 2 ) ( 2 0 2 2 2 2 The residuals i i i Y Y e ˆ are not independent RVs because they involve the fitted values i Y ˆ which are based on the same fitted regression function . As a result , the residuals for the regression model (2.1) are subject to two constraints: 1) the sum of i e must be 0 2) the products i i e X must sum to 0. Semistudentized residuals * i e for residual analysis: Since the standard deviation of the error terms i is , which is estimated by , MSE , 0 * MSE e MSE e e e i e i i where MSE is an approximation of the standard deviation of i e . This is useful to identify outliers.
Background image of page 2
STAT 410/510 F-2011 3 Six important types of departures from (2.1) with normal errors : 1) The regression function is not linear 2) The error terms do not have constant variance 3) The error terms are not indep . 4) The model fits all but one or a few outliers 5) The error terms are not normally distributed 6) One or several important predictor variables have been omitted from the model. 3.3 Diagnostics for residuals: Plots of residuals (or semistudentized residuals): 1) Plot of residuals against predictor variable 2) Plot of absolute or squared residuals against predictor variable 3) Plot of residuals against fitted values 4) Plot of residuals against time or other sequence 5) Plot of residuals against omitted predictor variable 6) Normal probability plot of residuals. Example: proc reg data =Toluca1; model workhrs=lotsize/ r ; *r represents residuals; output out =step2 r =residuals; plot r. *lotsize; run ; The REG Procedure Model: MODEL1 Dependent Variable: workhrs Dependent Predicted Std Error Std Error Student Cook's Obs Variable Value Mean Predict Residual Residual Residual
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 12/14/2011.

Page1 / 35

Chapter 3 - STAT 410/510 F-2011 Chap. 3 Diagnostics and...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online