This preview shows pages 1–3. Sign up to view the full content.
Lecture 8. Model Selection
Model Building can be thought of as a multistep process:
1. Data collection and preparation.
2. Model estimation.
3. Model reﬁnement and selection.
4. Model validation.
We have already discussed a few techniques how to choose important predictors, e.g.
t
,
F
statistics and avoiding multicollinearity. Here is a list of most popular model selection
procedures:
•
Rsquared
R
2
p
= 1

SSE
p
SST
•
Adjusted Rsquared
R
2
adj
.
p
= 1

SSE
p
/
(n

p

1)
SST
/
(n

1)
= 1

n

1
n

p

1
(1

R
2
p
).
The adjusted
R
2
p
has a penalty term for each regressor and does not necessarily increase
with adding a new regressor. Hence, the adjusted
R
2
p
is preferred over
R
2
p
.
•
Mallows
C
p
Criterion
C
p
=
SSE
p
s
2

[
n

2(
p
+ 1)]
If the candidate model is adequate, SSE
p
is an estimate of (
n

p

1)
σ
2
. Hence,
C
p
≈
p
+ 1 in this case. If the model is inadequate, then SSE
p
>
(n

p

1)
σ
2
and
C
p
> p
+ 1.
Hence, we search for models with
C
p
value being small and
C
p
≈
p
+ 1. When
C
p
is
small, the mean squared error is small. Also when
C
p
≈
p
+ 1, bias of the regression
model is small.
•
Akaike Information Criterion (AIC)
AIC
p
= nlog SSE
p
/
n + 2(p + 1)
Rule of thumb: smaller AIC is better.
•
Bayesian Information Criterion (AIC)
BIC
p
= n log SSE
p
/
n + 2 log n(p + 1)
Rule of thumb: smaller BIC is better.
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document •
Prediction Sum of Squares (PRESS)
The prediction sums of squares is a measure of how well the model can predict the
observed responses
y
i
.
Recall that
e
(
i
)
=
y
i

y
*
(
i
)
, where
y
*
(
i
)
=
x
0
i
ˆ
β
(
i
)
. Then PRESS
p
=
∑
n
i=1
e
(i)
=
∑
n
i=1
(
e
i
1

h
ii
)
2
.
Rule of thumb: smaller PRESS is better.
There exists a number of automatic procedures for model selection:
•
Forward Stepwise (FS) Regression
This procedures goes through a stepbystep process of adding variables until the best
model is produced based on your search criteria. At each step, an
F
test, AIC or BIC
are performed to determine if that variable is appropriate. In particular,
1. begin with the SLR model with that single predictor that has the highest sample
correlation with the response
Y
;
2. add to the model that predictor that meets three equivalent criteria:
(a) it has the highest sample partial correlation in absolute value with response,
adjusting for the predictors in the equation already,
(b) adding the variable will increase
R
2
more than any other single variable;
(c) the variable added would have the largest
t
 or
F
statistic of any of the
variables that are not already in the model;
3. continue until a stopping rule is met, where possible rules are:
(a) stop with a subset of a predetermined size
p
*
;
(b) stop if the absolute value of the
t
statistic (or alternatively
F
statistic) is less
than some predetermined number
κ
(or
κ
2
for
F
statistic);
(c) stop when multicollinearity occurred;
Note: you can use AIC, BIC or
C
p
instead of
F
.
•
Backward Stepwise (BS) Regression.
This is the end of the preview. Sign up
to
access the rest of the document.
This note was uploaded on 01/12/2012 for the course STAT 331 taught by Professor Yuliagel during the Spring '08 term at Waterloo.
 Spring '08
 YuliaGel

Click to edit the document details