This preview shows pages 1–9. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Agenda Regularization: Ridge Regression and the LASSO Statistics 305: Autumn Quarter 2006/2007 Wednesday, November 29, 2006 Statistics 305: Autumn Quarter 2006/2007 Regularization: Ridge Regression and the LASSO Agenda Agenda 1 The BiasVariance Tradeoff 2 Ridge Regression Solution to the 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression 3 Cross Validation KFold Cross Validation Generalized CV 4 The LASSO 5 Model Selection, Oracles, and the Dantzig Selector 6 References Statistics 305: Autumn Quarter 2006/2007 Regularization: Ridge Regression and the LASSO Part I: The BiasVariance Tradeoff Part I The BiasVariance Tradeoff Statistics 305: Autumn Quarter 2006/2007 Regularization: Ridge Regression and the LASSO Part I: The BiasVariance Tradeoff Estimating As usual, we assume the model: y = f ( z ) + , (0 , 2 ) In regression analysis, our major goal is to come up with some good regression function f ( z ) = z So far, weve been dealing with ls , or the least squares solution: ls has well known properties (e.g., GaussMarkov, ML) But can we do better? Statistics 305: Autumn Quarter 2006/2007 Regularization: Ridge Regression and the LASSO Part I: The BiasVariance Tradeoff Choosing a good regression function Suppose we have an estimator f ( z ) = z To see if f ( z ) = z is a good candidate, we can ask ourselves two questions: 1.) Is close to the true ? 2.) Will f ( z ) fit future observations well? Statistics 305: Autumn Quarter 2006/2007 Regularization: Ridge Regression and the LASSO Part I: The BiasVariance Tradeoff 1.) Is close to the true ? To answer this question, we might consider the mean squared error of our estimate : i.e., consider squared distance of to the true : MSE ( ) = E [   2 ] = E [( ) ( )] Example: In least squares (LS), we now that: E [( ls ) ( ls )] = 2 tr[( Z Z ) 1 ] Statistics 305: Autumn Quarter 2006/2007 Regularization: Ridge Regression and the LASSO Part I: The BiasVariance Tradeoff 2.) Will f ( z ) fit future observations well? Just because f ( z ) fits our data well, this doesnt mean that it will be a good fit to new data In fact, suppose that we take new measurements y i at the same z i s: ( z 1 , y 1 ) , ( z 2 , y 2 ) , . . . , ( z n , y n ) So if f ( ) is a good model, then f ( z i ) should also be close to the new target y i This is the notion of prediction error (PE) Statistics 305: Autumn Quarter 2006/2007 Regularization: Ridge Regression and the LASSO Part I: The BiasVariance Tradeoff Prediction error and the biasvariance tradeoff So good estimators should, on average have, small prediction errors Lets consider the PE at a particular target point z (see the board for a derivation): PE( z ) = E Y  Z = z { ( Y f ( Z )) 2  Z...
View Full
Document
 Spring '10
 TIBSHIRANI,R
 Statistics, Variance

Click to edit the document details