Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 – p. 1/1

Agenda Introduction to Regression Models. Motivation for Elastic Net. Naive Elastic Net and its grouping effect. Elastic Net. Experiments and Conclusions. – p. 2/1
Introduction to Regression Models Consider the following regression model with p predictors and n samples: y = + ǫ where X n × p = [ x 1 , x 2 , · · · , x p ] , β = [ β 1 , β 2 , · · · , β p ] and y = [ y 1 , y 2 , · · · , y n ] . ǫ is the additive noise with dimension n × 1 . Suppose the predictors ( x i ) are normalized to mean zero and variance one, and the regression output y sums to zero. Ordinary Least Squares (OLS): ˆ β ( OLS ) = arg min β bardbl y bardbl 2 Ridge Regression: ˆ β ( Ridge ) = arg min β bardbl y bardbl 2 + λ bardbl β bardbl 2 LASSO: ˆ β ( LASSO ) = arg min β bardbl y bardbl 2 + λ | β | 1 ( | β | 1 defines p summationdisplay j =1 | β j | ) Elastic Net: ˆ β ( Naive ENet ) = arg min β bardbl y bardbl 2 + λ 1 | β | 1 + λ 2 bardbl β bardbl 2 ˆ β ( ENet ) = (1 + λ 2 ) · ˆ β ( Naive ENet ) (1) – p. 3/1

Motivation for Elastic Net Prediction accuracy and model interpretation are two important aspects of regression models. LASSO is a penalized regression method to improve OLS and Ridge regression. LASSO does shrinkage and variable selection simultaneously for better prediction and model interpretation. Disadvantage of LASSO: LASSO selects at most n variables before it saturates. LASSO can not do group selection . If there is a group of variables among which the pairwise correlations are very high, then the LASSO tends to arbitrarily select only one variable from the group.
