# Chapter_5 - Stat 373 Ch 5 1 Chapter 5 Model Building In...

This preview shows pages 1–3. Sign up to view the full content.

Stat 373 – Ch. 5 - 1 Chapter 5 Model Building In many applications, we have a number of explanatory variates that we can chose to include or delete from the model. For example, if the problem is to predict the value of the response variate for a given set of values for the explanatory variates assess the effect of a particular explanatory variate (or variates) on the response variate when controlling for a number of other explanatory variates we may or may not include some of the explanatory variates in the model. We want to use the data to decide which variates to include or delete. This decision is important if we want to get a final model that is as simple and useful as possible and fits the data well. If we include unnecessary terms, we add to the model complexity and we can also distort the conclusion. We consider three strategies: 1. (Forward selection) Start with a simple one-variate models and select the one that best explains the variation in the response variate i.e. select the one with the highest value of R 2 or equivalently, the model with the largest F-ratio. Then add a second variate to the model that maximizes the increase in R 2 and has a coefficient significantly different from 0. Continue until we can find no more important variates to add. 2. (Backwards elimination) Start by fitting the full model. If any coefficient is judged not significantly different from 0, leave out the least important variate, based on the p-value for the test of the hypothesis that the corresponding coefficient is 0. Keep deleting until all of the included variates have coefficients significantly different from 0. 3. (All regressions) Fit all possible models – there are 21 p if we have p explanatory variates. Select two or three of each size and then use a criterion that balances the value of R 2 against the addition of extra variates to pick the “best” model. You might expect that strategies 1 and 2 would get the same answer but this is not always the case. We create an artificial example to demonstrate this point. The data are stored in ch6example1.txt. Example 1 We create the data with the code: u1<-rnorm(100);u2<-rnorm(100);u3<-rnorm(100);u4<-rnorm(100);r<-rnorm(100) x1<-u1;x2<-u1+u2;x3<-u1+2*u2-u3;x4<-u1+u4 y<-x1+1.2*x2-0.5*x3+2*r Adapted from Stat 371 Course Notes © R.J. MacKay, University of Waterloo, 2005

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Stat 373 – Ch. 5 - 2 Note that the second line introduces relationships among x1, x2, x3 and x4.The corresponding vectors are not close to orthogonal. We then try strategy 1 where we add significant variates. We start by fitting all of the one-variate models and picking the one with a significant coefficient (p-value less than 10%) and the highest value of R 2 . The results are Model Significant variates R 2 lm(y~x1) x1 0.4547 lm(y~x2) x2 0.2491 lm(y~x3) x3 0.0408 lm(y~x4) x4 0.2439 We select x1 and proceed by fitting the 3 two-variate models that include x1.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 7

Chapter_5 - Stat 373 Ch 5 1 Chapter 5 Model Building In...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online