Regression Modeling Selection Process When you have more than one regression equation based on data, to select the “best model", you should compare: 1. R-squares: That is, the percentage of variance [in fact, the sum of squares] in Y accounted for by variance in X captured by the model. 2. When you want to compare models of different sizes (different numbers of independent variables (p) and/or different sample sizes n), you must use the Adjusted R-Square, because the usual r-square tends to grow with the number of independent variables. r 2 a = 1 - (n - 1)(1 - r 2 )/(n - p - 1) 3. Standard deviation of error terms, i.e., observed y-value - predicted y-value for each x. 4. Trends in errors as a function of control variable x. Systematic trends are not uncommon. 5. The T-statistic of individual parameters. 6. The values of the parameters and its content to content underpinnings.
7. F df1 df2 value for overall assessment. Where df1 (numerator degrees of freedom) is the number of linearly independent predictors in the assumed model minus the number of linearly independent predictors in the restricted model; i.e., the number of linearly independent restrictions imposed on the assumed model, and df2 (denominator degrees of freedom) is the number of observations minus the number of linearly independent predictors in the assumed model. The observed F-statistic should exceed not merely the selected critical value of F-table, but at least four times the critical value. Finally in statistics for business, there exists an opinion that with more than 4 parameters, one can fit an elephant so that if one attempts to fit a regression function that depends on many parameters, the result should not be regarded as very reliable....
