# Chp18 - Copy - 18 High-Dimensional Problems: p N 18.1 When...

This preview shows pages 1–3. Sign up to view the full content.

18 High-Dimensional Problems: p ± N 18.1 When p is Much Bigger than N In this chapter we discuss prediction problems in which the number of features p is much larger than the number of observations N , often written p ± N . Such problems have become of increasing importance, especially in genomics and other areas of computational biology. We will see that high variance and overﬁtting are a major concern in this setting. As a result, simple, highly regularized approaches often become the methods of choice. The ﬁrst part of the chapter focuses on prediction in both the classiﬁcation and regression settings, while the second part discusses the more basic problem of feature selection and assessment. To get us started, Figure 18.1 summarizes a small simulation study that demonstrates the “less ﬁtting is better” principle that applies when p ± N . For each of N = 100 samples, we generated p standard Gaussian features X with pairwise correlation 0 . 2. The outcome Y was generated according to a linear model Y = p ± j =1 X j β j + σε (18.1) where ε was generated from a standard Gaussian distribution. For each dataset, the set of coeﬃcients β j were also generated from a standard Gaus- sian distribution. We investigated three cases: p =20 , 100 , and 1000. The standard deviation σ was chosen in each case so that the signal-to-noise ratio Var[E( Y | X )]) / Var( ε ) equaled 2. As a result, the number of signiﬁcant © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 649 DOI: 10.1007/b94608_18,

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
650 18. High-Dimensional Problems: p ± N 1.0 1.5 2.0 2.5 3.0 20 9 2 20 features Test Error 99 35 7 100 features 99 87 43 1000 features Effective Degrees of Freedom FIGURE 18.1. Test-error results for simulation experiments. Shown are box- plots of the relative test errors over 100 simulations, for three diﬀerent values of p , the number of features. The relative error is the test error divided by the Bayes error, σ 2 . From left to right, results are shown for ridge regression with three diﬀerent values of the regularization parameter λ : 0 . 001 , 100 and 1000 .The (average) eﬀective degrees of freedom in the ﬁt is indicated below each plot. univariate regression coeﬃcients 1 was 9, 33 and 331, respectively, averaged over the 100 simulation runs. The p = 1000 case is designed to mimic the kind of data that we might see in a high-dimensional genomic or proteomic dataset, for example. We ﬁt a ridge regression to the data, with three diﬀerent values for the regularization parameter λ :0 . 001, 100, and 1000. When λ =0 . 001, this is nearly the same as least squares regression, with a little regularization just to ensure that the problem is non-singular when p>N . Figure 18.1 shows boxplots of the relative test error achieved by the diﬀerent estimators in each scenario. The corresponding average degrees of freedom used in each ridge-regression ﬁt is indicated (computed using formula (3.50) on page 68 2 ). The degrees of freedom is a more interpretable parameter than λ . We see that ridge regression with λ . 001 (20 df) wins when p = 20; λ = 100 (35 df) wins when p = 100, and λ
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 07/14/2010 for the course STAT 132 taught by Professor Haulk during the Spring '10 term at The University of British Columbia.

### Page1 / 50

Chp18 - Copy - 18 High-Dimensional Problems: p N 18.1 When...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online