This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Stat 5102 Lecture Slides Deck 7 Charles J. Geyer School of Statistics University of Minnesota 1 Model Selection When we have two nested models, we know how to compare them: the likelihood ratio test. When we have a short sequence of nested models, we can also use the likelihood ratio test to compare each consecutive pair of models. This violates the do only one test dogma, but is mostly harmless when there are only a few models being com pared. But what if the models are not nested or if there are thousands or millions of models being compared? 2 Model Selection (cont.) This subject has received much theoretical attention in recent years. It is still an area of active research. But some things seem unlikely to change. Rudimentary efforts at model selection, socalled forward and backward selection procedures, although undeniably things to do (TTD), have no theoretical justification. They are not guar anteed to do anything sensible. Procedures that are justified theoretically evaluate a criterion function for all models in the class of models under consideration. They select the model with the smallest value of the criterion. 3 Model Selection (cont.) We will look at two such procedures involving the Akaike in formation criterion (AIC) and the Bayes information criterion (BIC). Suppose the log likelihood for model m is denoted l m , the MLE for model m is denoted m , the dimension of m is p m , and the sample size is n AIC( m ) = 2 l m ( m ) + 2 p m BIC( m ) = 2 l m ( m ) + log( n ) p m It is important to understand that both m and are parameters, so l m ( ) retains all terms in log f m, ( y ) that contain m or . 4 Model Selection (cont.) Suppose we want to select the best model (in some sense) from a class M which contains a model m sup that contains all models in the class. For example, suppose we have a linear model with q predictors and the class M consists of all linear models in which the mean vector is a linear function of some subset of these q predictors = + X s S s x s where S is a subset, possibly empty, of these predictors. Since there are 2 q subsets, there are 2 q models in the class M . The model m sup is the one containing all q of the predictors. 5 Model Selection (cont.) Each model contains an intercept , so m sup has q + 1 parame ters. A model with k predictors has k + 1 parameters, including the intercept. The p m in AIC or BIC is the number of parameters (including the intercept)....
View
Full
Document
This note was uploaded on 10/28/2010 for the course STAT 5102 taught by Professor Staff during the Spring '03 term at Minnesota.
 Spring '03
 Staff
 Statistics

Click to edit the document details