# Chp3 - Regression Diagnostics and Variable Transformation...

This preview shows pages 1–5. Sign up to view the full content.

Regression Diagnostics and Variable Transformation For the normal error regression model: Y i = β 0 + β 1 X i + ± i ; i =1 , ··· ,n, (1) where: 1. β 0 and β 1 are parameters 2. X i are known constants 3. ± i are independent N (0 2 ). We will consider the adequacy of the linear model, β 0 + 1 , error dis- tribution assumption of ± , and outliers. Subjective graphical approach and formal statistical tests will be discussed. Variable transformation will be used to make adjustments. 1 Six Types of Departure from Regression Model (1) The regression function is not linear. Other important predictor variables have been omitted from the model. ± have non-constant variance. ± are not independent. ± are not normally distributed. The model ±ts all but few outlier observations. 2

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Univariate Graphical Summary Histogram and Stem-and-Leaf Plot: discrete distribution density estimation Boxplot: graphical summary of important numbers (median, quar- tile and outliers) Sequence plot: exploration of time pattern Quantile-Quantile Plot 3 Toluca Company Data: WorkHours Histogram of WorkHours WorkHours Frequency 100 200 300 400 500 600 02468 Box Plot 5 1 01 52 02 5 100 200 300 400 500 Sequence Plot Index WorkHours -2 -1 0 1 2 100 200 300 400 500 Normal Q-Q Plot Theoretical Quantiles Sample Quantiles 4
Quantile-Quantile (Probability) Plot Let { y (1) , ··· ,y ( n ) } denote the order statistics of n samples from a dis- tribution with the form F [( μ ]. To construct the probability plot, the sample order statistic i is plotted against x 1 p where estimates W ), where =( . is also called the plotting positions. and Φ ) are both quantile estimates. The probability plot is also called the quantile-quantile plot. 5 Some commonly used forms of plotting positions = +1 , which is actually equal to E [ )]. (2) 0 375 +0 25 , which approximates )] for the normal distribu- tion. Simulation studies suggest that this is a good approximation. (3) 3175 365 ,i<n ; =0 /n M )], · ) is the median function. Routine in R: qqnorm{WorkHours}; qqline{WorkHours} 6

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Residual Properties Most of our diagnostics concern the distribution of errors ± i .Inap- plication, we usually estimate by the residuals e = Y ˆ .Aswe discussed earlier, n X =1 =0; =0 . The residuals are linear functions of , so their joint distributions are multivariate normal. We know E ( )= b 0 + 1 )=0and the covariance matrix are Var( ¯ ) 2 j σ Cov( ,e )( 7 In our model assumptions, the errors are independent. But their estimate are dependent. The reason is that all observations are pooled
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 16

Chp3 - Regression Diagnostics and Variable Transformation...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online