Chp18 - Copy - 18 High-Dimensional Problems: p N 18.1 When...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
18 High-Dimensional Problems: p ± N 18.1 When p is Much Bigger than N In this chapter we discuss prediction problems in which the number of features p is much larger than the number of observations N , often written p ± N . Such problems have become of increasing importance, especially in genomics and other areas of computational biology. We will see that high variance and overfitting are a major concern in this setting. As a result, simple, highly regularized approaches often become the methods of choice. The first part of the chapter focuses on prediction in both the classification and regression settings, while the second part discusses the more basic problem of feature selection and assessment. To get us started, Figure 18.1 summarizes a small simulation study that demonstrates the “less fitting is better” principle that applies when p ± N . For each of N = 100 samples, we generated p standard Gaussian features X with pairwise correlation 0 . 2. The outcome Y was generated according to a linear model Y = p ± j =1 X j β j + σε (18.1) where ε was generated from a standard Gaussian distribution. For each dataset, the set of coefficients β j were also generated from a standard Gaus- sian distribution. We investigated three cases: p =20 , 100 , and 1000. The standard deviation σ was chosen in each case so that the signal-to-noise ratio Var[E( Y | X )]) / Var( ε ) equaled 2. As a result, the number of significant © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 649 DOI: 10.1007/b94608_18,
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
650 18. High-Dimensional Problems: p ± N 1.0 1.5 2.0 2.5 3.0 20 9 2 20 features Test Error 99 35 7 100 features 99 87 43 1000 features Effective Degrees of Freedom FIGURE 18.1. Test-error results for simulation experiments. Shown are box- plots of the relative test errors over 100 simulations, for three different values of p , the number of features. The relative error is the test error divided by the Bayes error, σ 2 . From left to right, results are shown for ridge regression with three different values of the regularization parameter λ : 0 . 001 , 100 and 1000 .The (average) effective degrees of freedom in the fit is indicated below each plot. univariate regression coefficients 1 was 9, 33 and 331, respectively, averaged over the 100 simulation runs. The p = 1000 case is designed to mimic the kind of data that we might see in a high-dimensional genomic or proteomic dataset, for example. We fit a ridge regression to the data, with three different values for the regularization parameter λ :0 . 001, 100, and 1000. When λ =0 . 001, this is nearly the same as least squares regression, with a little regularization just to ensure that the problem is non-singular when p>N . Figure 18.1 shows boxplots of the relative test error achieved by the different estimators in each scenario. The corresponding average degrees of freedom used in each ridge-regression fit is indicated (computed using formula (3.50) on page 68 2 ). The degrees of freedom is a more interpretable parameter than λ . We see that ridge regression with λ . 001 (20 df) wins when p = 20; λ = 100 (35 df) wins when p = 100, and λ
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 07/14/2010 for the course STAT 132 taught by Professor Haulk during the Spring '10 term at The University of British Columbia.

Page1 / 50

Chp18 - Copy - 18 High-Dimensional Problems: p N 18.1 When...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online