This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Statistical Techniques II
Regression Diagnostic Criteria Appendix 1 Supplement
Page 143 Criteria for the interpretation of selected regression statistics from the SAS output
Reference was primarily Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W.,
Applied Linear Statistical Models, 4th Edition, Richard D. Irwin, Inc., Burr Ridge, Illinois,
General regression diagnostics
Adjusted R2 : Radj 1 n p b g FG SSError IJ FG n 1 IJ c1 R h
b g H SSTotal K H n p K
2 This is intended to be an adjustment to R2 for additional variables in the model
Unlike the usual R2, this value can decrease as more variables are entered in the model if
the variables do not account for sufficient additional variation (equal to the MSE).
Standardized regression coefficient bj': bj' = bj (Sxj / Sy) Unlike the usual regression coefficient, the magnitude of the standardized coefficient
provides a meaningful comparison among the regression coefficients. Larger standardized regression coefficients have more impact on the calculation of the
predicted value and are more “important”.
Partial correlations Squared semi-partial correlation TYPE I = SCORR1 = SeqSSXj / SSTotal Squared partial correlation TYPE I = PCORR1 = SeqSSXj / (SeqSSXj + SSError) Squared semi-partial correlation TYPE II = SCORR2 = PartialSSXj / SSTotal Squared partial correlation TYPE II = PCORR2 = PartialSSXj / (PartialSSXj + SSError) Note that for regression, TYPE II SS and TYPE III SS are the same. Residual Diagnostics
The hat matrix main diagonal elements, hii (Hat Diag , H values in SAS) , called “leverage
values”, they are used to detect unusual observations in the X space. . This can also identify
substantial extrapolation of new values. As a general rule, hii values greater than 0.5 are “large”
while those between 0.2 and 0.5 are moderately large, also look for a leverage value which is
noticeably larger than the next largest. The hii values sum to p mean,hii = p/n (note that this is < 1) A value may be an “outlier” if it is more than twice the valuehii (i.e.hii > 2p/n).
Studentized residuals (“Student Residual” in SAS). Also called Internally Studentized Residual.
There are two versions: Simpler calculation = ei / root(MSE) More common application = ei / root(MSE * (1-hii)) [SAS produces these]
We already assume these are normally distributed, so these values would approximately follow
a t distribution, where for large samples about 65% are between -1 and +1 about 95% are between -2 and +2 about 99% are between -2.6 and +2.6
Deleted Studentized residuals (“RStudent” in SAS). Also called externally studentized residual.
There are also two versions as with the studentized residuals above Deleted Studentized = ei(i) / root(MSE(i)) Deleted Internally Studentized = ei(i) / root(MSE(i) *1-hii) [SAS produces these values]
As with the studentized residuals above these values would approximately follow a t
distribution James P. Geaghan - Copyright 2011 Statistical Techniques II
Regression Diagnostic Criteria Appendix 1 Supplement
Page 144 Influence Diagnostics
DFFITS; an influence statistic, it measures the difference in fits as judged by the change in
predicted value when the point is omitted This is a standardized value and can be interpreted as the number of standard deviation
units for small to medium size databases, DFFITS should not exceed 1, while for large databases
it should not exceed 2*sqrt(p/n)
DBETAS; an influence statistic, it measures the difference in fits as judged by the change in the
values of the regression coefficients note that this is also a standardized value for small to medium size databases, DFBETAS should not exceed 1, while for large
databases it should not exceed 2/sqrt(n)
Cook's D : influence statistic (D is for distance) The boundary of a simultaneous regional confidence region for all regression coefficients this does not follow an F distribution, but it is useful to compare it to the percentiles of the
F distribution [F1-; p, n-p] where a change of < 10th or 20th percentile shows little effect,
while the 50th percentile is considered large
VIF is related to the severity of multicollinearity a standardized estimate of regression coefficients would be expected to have a value of 1 if
the regressors are uncorrelated If the mean of this value is much greater than 2, serious problems are indicated. No single VIF should exceed 10 Tolerance is the inverse of VIF, where Tolerancek = 1-Rk2
The Condition number (a multivariate evaluation) Eigen values are extracted from the regressors, These are variances of linear combinations
of the regressors, and go from larger to smaller. If one or more are zero (at the end) then the matrix is not full rank. These sum to p, and if the Xk are independent, each would equal 1 The condition number is the square root of the ratio of the largest (always the first) to each
of the others. If this value exceeds 30 then multicollinearity may be a problem.
Model Evaluation and Validation
R2p, AdjR2p and MSEp can be used to graphically compare and evaluate models. The subscript p
refers to the number of parameters in the model
Mallow's Cp criterion Use of this statistic presumes no bias in the full model MSE, so the full model should be
carefully chosen to have little or no multicollinearity Cp criterion = (SSEp / TrueMSE) -(n - 2p) The Cp statistics will be approximately equal to p if there is no bias in the regression model
PRESSp criterion (PRESS = Prediction SS) This criterion is based on deleted residuals. There are n deleted residuals in each regression, and PRESSp is the SS of deleted residuals This value should approximately equal the MSE if predictions are good, it will get larger as
predictions are poorer They may be plotted, and the smaller PRESS statistic models represent better predictive
models. This statistics can also be used for model validation
James P. Geaghan - Copyright 2011 ...
View Full Document
This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.
- Fall '08