# Topic 06 - Topic 6 Model Selection Selecting the Best...

This preview shows pages 1–11. Sign up to view the full content.

1 Topic 6 – Model Selection Selecting the “Best” Regression Model (Chapters 9 & 16)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 Overview We already have many of the pieces in place. We’ll use those to develop algorithmic procedures. There are some additional statistics that can be used for comparison of models. We will discuss these in terms of their pros/cons. We will also look at how well different model selection procedures work on some real data sets.
3 Review: Tools already in the bag ANOVA F Test : Is any variable important? Variable Added Last T-tests : Identify which variables are “most” important. Partial F Tests : Test “groups” of variables Coefficients of Determination : Look at % of variation explained by variables.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
4 Algorithms Put the pieces together to come up with “best” model. Many approaches… Start Small? Add most significant T-test? Start Large? Remove least significant? Start somewhere in the middle? As always, the approach you choose will depend on your goals.
5 Reliability vs. Interpretability Goal: Find the “best” subset of the predictor variables to model the response. We’ve already indicated that it is necessary to keep some balance between “fewer predictors” and “greater explained variation” (higher R 2 ). Another way to put this is that we want to achieve a balance between reliability and interpretability .

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
6 Reliability A reliable model is one that provides the best prediction for new observations. With reliability we are concerned about whether our model will generalize to other samples. If we “overfit” the model then we may not be able to apply the model to other data sets (due to missing predictors)
7 Interpretability A model that is valid for interpretation allows for accurate quantification of the relationship between Y and certain X’s, while perhaps controlling for other variables. Focus is not on prediction so much as it is on the regression coefficients themselves. We want to interpret the coefficients. Multicollinearity is a big issue when we talk about interpretability.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
8 Balancing Act Your goals (prediction or interpretation) will determine which of reliability and interpretability is more important. It is generally easier to attain reliability. (Why?)
9 “Best” in terms of Reliability Five step selection process 1. Select the maximum model that you are willing to consider. 2. Choose model selection criteria. 3. Choose model searching strategy. 4. Conduct the analysis. 5. Evaluate reliability of the chosen model (a.k.a. Model Validation)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
10 Step 1: Maximum Model The Maximum Model is the largest model to be considered at any point in the analysis. Any other model is created by deleting predictors
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 02/20/2012 for the course STAT 502 taught by Professor Staff during the Fall '08 term at Purdue University-West Lafayette.

### Page1 / 43

Topic 06 - Topic 6 Model Selection Selecting the Best...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online