weatherwax_hastie_solutions_manual

weatherwax_hastie_solutions_manual - A Solution Manual and...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: A Solution Manual and Notes for the Text: The Elements of Statistical Learning by Jerome Friedman, Trevor Hastie, and Robert Tibshirani John L. Weatherwax December 15, 2009 * wax@alum.mit.edu 1 Chapter 2 (Overview of Supervised Learning) Notes on the Text Statistical Decision Theory Our expected predicted error (EPE) under the squared error loss and assuming a linear model for y i.e. y = f ( x ) x T is given by EPE( ) = integraldisplay ( y x T ) 2 Pr( dx,dy ) . (1) Considering this a function of the components of i.e. i to minimize this expression with respect to i we take the i derivative, set the resulting expression equal to zero and solve for i . Taking the vector derivative with respect to the vector we obtain EPE = integraldisplay 2 ( y x T ) ( 1) x Pr( dx,dy ) = 2 integraldisplay ( y x T ) x Pr( dx,dy ) . (2) Now this expression will contain two parts. The first will have the integrand yx and the second will have the integrand x T x . This latter expression in terms of its components is given by x T x = ( x + x 1 1 + x 2 2 + + x p p ) x x 1 x 2 . . . x p = x x + x x 1 1 + x x 2 2 + ... + x x p p x 1 x + x 1 x 1 1 + x 1 x 2 2 + ... + x 1 x p p . . . x p x + x p x 1 1 + x p x 2 2 + ... + x p x p p = xx T . So with this recognition, that we can write x T x as xx T , we see that the expression EPE = 0 gives E [ yx ] E [ xx T ] = 0 . (3) Since is a constant, it can be taken out of the expectation to give = E [ xx T ]- 1 E [ yx ] , (4) which gives a very simple derivation of equation 2.16 in the book. Note since y R and x R p we see that x and y commute i.e. xy = yx . Exercise Solutions Ex. 2.1 (target coding) If each of our samples from K classes is coded as a target vector t k which has a one in the k th spot. Then one way of developing a classifier is by regressing the independent variables onto the target vectors t k . Then our classification procedure would then become the following. Given the measurement vector X , predict a target vector y via linear regression and to select the class k corresponding to the component of y which has the largest value. That is k = argmax i ( y i ). Now consider the expression argmin k || y t k || , which finds the index of the target vector that is closest to the produced regression output y . By expanding the quadratic we find that argmin k || y t k || = argmin k || y t k || 2 = argmin k K summationdisplay i =1 ( y i ( t k ) i ) 2 = argmin k K summationdisplay i =1 ( ( y i ) 2 2 y i ( t k ) i + ( t k ) i 2 ) = argmin k K summationdisplay i =1 ( 2 y i ( t k ) i + ( t k ) i 2 ) , since the sum K i =1 y 2 i is the same for all classes k and we have denoted ( t k ) i to be the i th component of the...
View Full Document

Page1 / 18

weatherwax_hastie_solutions_manual - A Solution Manual and...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online