Chapter 10
Variable Selection
Multicollinearity
A set of terms,
X
1
,X
2
, . . .,X
p
are approximately collinear if, for constants
c
0
,
c
1
, . . ., c
p
,
which is similar to a linear regression mean function with intercept
c
0
/c
j
and slopes 
c
l
/c
j
.
Diagnostic method:
Step1: Regress
X
j
on the other
X
’s
Step2: Calculate R
2
, which we will call
R
2
j
If the largest
R
2
j
is near 1, we would diagnose approximate collinearity.
VIF
When
p >
2, the variance of the
j
th coefficient is
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
/(
1 
R
2
j
)
is called the
j
th
variance inflation factor
, or
VIF
j
Variable Selection
Principle of Parsimony (Occam’s razor):
Choose fewer variables with
sufficient explanatory power. This is a desirable modeling strategy.
The goal of variable selection is thus to identify the smallest subset of
covariates that provides good fit.
One way of achieving this is to retain the significant predictors in the fitted
multiple regression. This may not work well if some variables are strongly
correlated among themselves or if there are too many variables
(e.g., exceeding the sample size).
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '08
 Ma,P
 Linear Regression, Regression Analysis, AIC, Stepwise regression, variable selection, Stepwise Variable Selections, Variable Selection Multicollinearity

Click to edit the document details