chap8

# chap8 - Shrinkage Methods Revisited(Ch 9 of Faraway 1...

This preview shows pages 1–15. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Shrinkage Methods Revisited (Ch 9 of Faraway) 1 Shrinkage Methods • Principal components regression • Partial least squares • Ridge regression 2 Principal Components Motivation: • Reduce the dimensionality of the data • Illustration-2-1 1 2-2-1 1 2 x1 x2 g replacements x 1 x 2 3 Definition 1. Center each variable by its mean, x j- ¯ x j ⇒ X n × p . 2. Find u 1 such that var ( Xu 1 ) is maximized subject to u T 1 u 1 = 1. 3. Find u 2 such that var ( Xu 2 ) is maximized subject to u T 2 u 2 = 1 and u T 2 u 1 = 0. 4. And so on. 4 Remarks • z j = Xu j are projections of data points on u j . • z j = Xu j are called the principal compo- nents . • u j are the eigenvectors of X T X . • var ( Xu j ) = λ j , the eigenvalues of X T X . • Recommended: scale each variable by its standard deviation beforehand. 5 Principal Components Regression PCR replaces the regression y ∼ x with y ∼ z Typically only a few eigenvalues will be large so that most variation in X can be represented by a few principal components. Dimension re- duction . 6 Food Analyzer Example • Response: fat content • Predictors: 100 channel spectrum of ab- sorbances • Number of data points: n = 215 • Number of predictors: p = 100 7 Prediction Performance Goal : build a model that predicts well on future data. Divide the data into two groups: training sam- ples and testing samples . Build the models us- ing the training samples and evaluate them on the testing samples. 8 Food Analyzer Example Continued > library(faraway) > data(meatspec) > dim(meatspec) [1] 215 101 > ## Training data > tr <- meatspec[1:172,] > ## Test data > te <- meatspec[173:215,] ## Linear model > g1 <- lm(fat ~ ., tr) > ## R2 > summary(g1)\$r.squared [1] 0.9970196 > ## Root mean squared error > rmse <- function(x, y) { + sqrt(mean( (x - y)^2 )) + } > rmse(g1\$fit, tr\$fat) [1] 0.6903167 > ## Prediction > rmse( predict(g1, new=te), te\$fat ) [1] 3.814000 ## AIC > g2 <- step(g1) > rmse( g2\$fit, tr\$fat ) [1] 0.7095069 > rmse( predict(g2, te), te\$fat ) 9 [1] 3.590245 > ## Principal components regression > library(mva) > meatpca <- prcomp(tr[,-101]) > ## Square root of the eigenvalues > round(meatpca\$sdev, 3) [1] 5.055 0.511 0.282 0.168 0.038 0.025 0.014 [8] 0.011 0.005 0.003 0.002 0.002 0.001 0.001 [15] 0.001 0.000 0.000 0.000 0.000 0.000 0.000 [22] 0.000 0.000 0.000 ... ... > matplot(1:100, meatpca\$rot[,1:3], type="l", xlab="Frequency", ylab="") > plot(1:10, meatpca\$sdev[1:10], type="l", xlab="PC number", ylab="SD of PC") ## Choose PC number > ## Mean of each variable > mm <- apply( tr[,-101], 2, mean ) > ## Apply it to the test data > tex <- as.matrix( sweep(te[, -101], 2, mm) ) > rmsmeat <- NULL > for (i in 1:50) { + g3 <- lm(fat ~ meatpca\$x[, 1:i], tr) + ## Compute the PC for the test data + nx <- tex %*% meatpca\$rot[, 1:i] + ## Predicted values + pv <- cbind(1, nx) %*% g3\$coef + rmsmeat[i] <- rmse( pv, te\$fat ) + } > plot(rmsmeat, xlab="PC number", ylab="Test RMS") > which.min(rmsmeat) [1] 27 > min(rmsmeat) [1] 1.854858 Food Analyzer Example Continued 20 40 60 80 100-0.2-0.1 0.0 0.1 Frequency 10 Food Analyzer Example Continued 2 4 6 8 10 1 2 3 4 5 PC number SD of PC cements C number SD of PC 11 Food Analyzer Example Continued 10 20 30 40 50 2 4 6 8 10 12 PC number Test RMs cements C number T est RMS 12 Cross-Validation...
View Full Document

## This note was uploaded on 11/17/2011 for the course STOR 664 taught by Professor Staff during the Fall '11 term at UNC.

### Page1 / 50

chap8 - Shrinkage Methods Revisited(Ch 9 of Faraway 1...

This preview shows document pages 1 - 15. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online