s7 - Stat 5102 Lecture Slides Deck 7 Charles J Geyer School...

This preview shows pages 1–7. Sign up to view the full content.

Stat 5102 Lecture Slides Deck 7 Charles J. Geyer School of Statistics University of Minnesota 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Model Selection When we have two nested models, we know how to compare them: the likelihood ratio test. When we have a short sequence of nested models, we can also use the likelihood ratio test to compare each consecutive pair of models. This violates the “do only one test” dogma, but is mostly harmless when there are only a few models being com- pared. But what if the models are not nested or if there are thousands or millions of models being compared? 2
Model Selection (cont.) This subject has received much theoretical attention in recent years. It is still an area of active research. But some things seem unlikely to change. Rudimentary efforts at model selection, so-called forward and backward selection procedures, although undeniably things to do (TTD), have no theoretical justification. They are not guar- anteed to do anything sensible. Procedures that are justified theoretically evaluate a criterion function for all models in the class of models under consideration. They “select” the model with the smallest value of the criterion. 3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Model Selection (cont.) We will look at two such procedures involving the Akaike in- formation criterion (AIC) and the Bayes information criterion (BIC). Suppose the log likelihood for model m is denoted l m , the MLE for model m is denoted ˆ θ m , the dimension of ˆ θ m is p m , and the sample size is n AIC( m ) = - 2 l m ( ˆ θ m ) + 2 p m BIC( m ) = - 2 l m ( ˆ θ m ) + log( n ) p m It is important to understand that both m and θ are parameters, so l m ( θ ) retains all terms in log f m, θ ( y ) that contain m or θ . 4
Model Selection (cont.) Suppose we want to select the best model (in some sense) from a class M which contains a model m sup that contains all models in the class. For example, suppose we have a linear model with q predictors and the class M consists of all linear models in which the mean vector μ is a linear function of some subset of these q predictors μ = α + X s S β s x s where S is a subset, possibly empty, of these predictors. Since there are 2 q subsets, there are 2 q models in the class M . The model m sup is the one containing all q of the predictors. 5

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Model Selection (cont.) Each model contains an intercept α , so m sup has q + 1 parame- ters. A model with k predictors has k + 1 parameters, including the intercept. The p m in AIC or BIC is the number of parameters (including the intercept).
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern