Chapter10

Chapter10 - Validation of Regression Models STAT 563 Spring

Validation of Regression Models STAT 563 Spring 2007

Adequacy vs Validation Model Adequacy requires Residual analysis Testing for lack of fit Searching for influential observations Other internal analysis Validation is directed toward determining if the model will function successfully in its intended operating environment
Note Proper validation should include Study of the coefficients to see if their signs and magnitudes are reasonable Stability of the coefficients (how similar the coefficient would be with a new sample) Predictive performance of the model (both interpolation and extrapolation modes should be investigated)

Validation techniques Check coefficients and prediction values with prior experience, physical theory and other analytical/simulated models Collection of new (or fresh) data Data splitting Set aside a part of the original data to investigate model’s predictive performance
Case Study Hald Cement Data (Example 9.1) Heat evolved in calories per gram of cement (y) Tricalcium aluminate (x1) Tricalcium silicate (x2) Tetracalcium alumino ferrite (x3) Dicalcium silicate (x4) Ingredients

All Possible Regression
All possible regression

All possible regression
Aitkin proposal Define Any subset producing an R 2 greater than R 0 2 is called an R 2 -adequate( α ) subset . el full the for R R k n F k d where d R R k k n k k n k n k mod 1 ) 1 )( 1 ( 1 2 2 1 1 , , , , , , 2 1 2 0 = - - = + - - = + - - + α α α

Hald data Compute Clearly several models satisfy Aitkin’s criteria, so the final model choice is still not clear.
