Unformatted text preview: Chapter 3. Model Selection 1. Introduction ‐ In some situations, there is a large number of explanatory variables which may or may not relevant for making predictions about the response variable. ‐ It is useful to be able to reduce the model to contain only the variables which provide important information about the response variable. ‐ But deciding which explanatory variables to include in the simpler model is not always trivial. ‐ There are many different methods for selecting the best regression model, but for each method, two key issues must always be taken into consideration: a. What do we mean by “best” model? (Selection criterion) b. How can we locate the “best” model? (Selection procedure) ‐ First we need to define the maximum model, the model containing all explanatory variables which could possibly be present in the final model. (Note that this includes interaction terms and higher order terms) o y = ¯0 + ¯1x1 + ¢ ¢ ¢ + ¯pxp + ² STAT6220 2 o p : the maximum number of feasible explanatory variables 2. Initial selection: a. It’s a difficult task. (the more variables the better? but it is hard to interpret and gathering more data is costly) b. Good initial selection will benefit later in the form of better prediction. c. If we leave out one or more “important” variables (under‐specified), the additional variability in y‐values that would be accounted for with these variables becomes part of the estimated error variance. d. If a model contains one or more “extraneous” predictor variables (over‐specified), we stand a chance of having a multicollinearity. e. Use your knowledge and consult with experts. f. Try to find independent variables that should correlate decently with the dependent variable but do not have obvious correlation with each other. 3. Best Subset Method ‐ The most careful selection procedure is to consider all possible models with the dependent variable and one or more independent variables form the list of candidate variables. STAT6220 3 ‐ This method should always be preferred unless the number of possible explanatory variables is large. ‐ If there are p independent variables, 2p models are possible. ‐ In this procedure, all possible models are fitted to the data, and a selection criterion is used on all the models in order to find the model which is preferable to all others. ‐ Note that one has to choose the selection criterion carefully, as different selection criteria can result in different “best” models. ‐ Various criteria to select variables (k : the number of explanatory variables in the model being considered.) o Smallest s2 = MSE =
2 o Highest Ra SSE n¡k¡1 R2: the closer the model fits the data, the larger R2 will be. However, due to the way it is defined, the largest model will always have the largest R2, whether the extra variables provide any important information about y or not) o Smallest Mallows Ck statistic: SSEk Ck = ¡ (n ¡ 2(k + 1)) s2 S SEk : the SSE from a model with k predictors. s2: the MSE from the model with all p predictors. STAT6220 4 If the considered model is as good as the maximum model in terms of error variance, Ck will be close to k + 1. In practice, select the model with smallest Ck. (Note that Cp is always p + 1 for the maximum model). ‐ Example with p = 5. Minitab output STAT6220 5 4. Backward Elimination ‐ Begin with the model that contains all candidate variables, then eliminate the least significant variable one by one until all the variables in the model are significant at level ®. (Minitab default : ® =.10) ‐ We look for the x variable that has the largest p‐value for the individual t‐test, and if it is larger than ® we remove it. ‐ Then we fit the model with one less predictors and repeat the process until all p‐values are less than ®. Note that once removed, a variable cannot get back in the model at later steps. ‐ Minitab output: STAT6220 6 5. Forward Selection ‐ A reversed version of the backward elimination. ‐ We start with an ‘empty’ model with no explanatory variables, and add variables one by one until we cannot improve the model the model significantly by adding another variable. ‐ Once a variable is added to the model, it stays in the model. STAT6220 7 STAT6220 8 ‐ The two procedures, backward elimination and forward selection, have the computational advantage that one only has to fit a small subset of the possible models. ‐ The maximum number of models to be fitted is p in the backward elimination and p(p ¡ 1)=2 in the forward selection procedure. In both cases, it is a substantial reduction of the models to be fitted in the best subset procedure. ‐ However, the main drawback of these two procedures is exactly the same as the advantage: we only consider a small subset of the possible models. The risk of missing out the best model increases rapidly as the number of explanatory variables increases. 6. Stepwise regression ‐ The performances of the forward (F. S.) and backward (B. E.) procedures can be greatly improved by introducing the modification: stepwise regression (S. R.) procedure. ‐ Here, we shall consider S. R. based on F. S. (S. R. based on B. E. can be defined in a similar way.) ‐ Recall: In F. S., once a variable is added, it stays in the model‐ irrespective of which other variables are added later on. However, it can easily happen that a variable entered early in the procedure becomes superfluous because of its inter‐relationship with other variables added to the model later on in the procedure. STAT6220 9 ‐ The S.R. modifies F.S. in the following way: Each time a new variable is added to the model, the significance of each of the variables already in the model is re‐examined. ‐ That is, at each step in F. S., we test for significance of each of the variables currently in the model, and remove the one with the highest p‐value (if the p‐value > ®, say .10) ‐ The model is then re‐fitted without this variable, before going to the next step in the F. S. ‐ The S. R. continues until nor more variables can be added or removed. ‐ An example with p = 7: STAT6220 10 STAT6220 11 ‐ The. S. R. is better than F. S. and B. E., because it considers more (relevant) models. ‐ At the same time, S.R. is much quicker than the best‐subset procedure, when p is large. ‐ However, if p is small or if fitting vast numbers of models is not a problem, it is recommended to use the best‐subset procedure, rather than S.R. STAT6220 ...
View
Full Document
 Spring '08
 Staff
 explanatory variables

Click to edit the document details