This preview shows pages 1–3. Sign up to view the full content.
ISYE6414
Summer 2010
Lecture 9
Shrinkage Methods
Dr. Kobi Abayomi
July 6, 2010
1
Orthogonalization
The best scenario for the observed data
x
in multiple regression is each
x
j
⊥
x
k
: the observed
data are linearly independent. Remember a linear regression is a conditional expectation of
the response variable
Y
given the observed data
X
=
x
— if the covariate predictors are
completely linearly independent, they form an orthogonal basis for
Y
.
Perfect Collinearity is the opposite of linear orthogonality: the predictors
x
form a degenerate
(deﬁcient rank) basis for
Y
. The regression coeﬃcient estimates
ˆ
β
are nonidentiﬁable, in
this stiuation, and their variance is inﬂated.
The following methods are designed to mitigate the eﬀects of collinearity (linear dependence)
between the predictors by replacing them with
components
— linear combinations — that
are generated to be linearly independent.
2
Principal Components Regression (PCR)
Recall that
x
T
x
=
ˆ
Σ is the estimate of the covariate matrix of the predictors. Call
λ
=
(
λ
1
,...,λ
k
) the
eigenvalues
of
ˆ
Σ, and
e
its
eigenvectors
. As such
ˆ
Σ =
e
Λ
e
T
, with Λ =
diag
(
λ
1
,...,λ
k
).
The
jth
principal component
of
x
is
z
j
=
e
T
j
x
=
e
j
1
x
1
+
···
+
e
jk
x
k
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentConstructing the principal components as the inner products of the eigenvectors and the
predictors yields
principal components
z
1
,...,
z
k
with these properties
V ar
(
z
j
) =
λ
i
Cov
(
z
j
,z
k
) = 0
from the linear orthogonality of the the eigenvectors. This is the
Principal Component Anal
ysis
(PCA) procedure.
Principal Component Regression
(PCR) is replacing the predictors in
the regression equation with their linearly orthogonal linear combinations: replace ˆ
y
=
x
T
ˆ
β
with ˆ
y
=
z
T
ˆ
β
*
.
The goal in the PCR program is to remove linear dependence and to express ˆ
y
via a low
number of components.
Here’s an example in
R
library(faraway)
data(meatspec)
####data on fat content of 215 samples of meat
###with 100 channel spectrum of absorbances
###predict fat content from spectrum data
###variables 1100 range of spectrum
###training sample is first 172 observations
model1 < lm(fat ~ ., meatspec[1:172,])
summary(model1)$r.squared
###use RMSE as a stat for gof for training and test sample
rmse < function(x,y) sqrt(mean((xy)^2))
rmse(model1$fit,meatspec$fat[1:172])
rmse(predict(model1,meatspec[173:215,]),meatspec$fat[173:215])
###bad performance for test sample
###fit to any data is just that!
This is the end of the preview. Sign up
to
access the rest of the document.
 Fall '08
 Staff

Click to edit the document details