*This preview shows
pages
1–3. Sign up to
view the full content.*

This ** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **CS229 Problem Set #3 Solutions 1 CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning 1. Uniform convergence and Model Selection In this problem, we will prove a bound on the error of a simple model selection procedure. Let there be a binary classification problem with labels y { , 1 } , and let H 1 H 2 . . . H k be k different finite hypothesis classes ( |H i | < ). Given a dataset S of m iid training examples, we will divide it into a training set S train consisting of the first (1 ) m examples, and a hold-out cross validation set S cv consisting of the remaining m examples. Here, (0 , 1). Let h i = arg min h H i S train ( h ) be the hypothesis in H i with the lowest training error (on S train ). Thus, h i would be the hypothesis returned by training (with empirical risk minimization) using hypothesis class H i and dataset S train . Also let h i = arg min h H i ( h ) be the hypothesis in H i with the lowest generalization error. Suppose that our algorithm first finds all the h i s using empirical risk minimization then uses the hold-out cross validation set to select a hypothesis from this the { h 1 , . . . , h k } with minimum training error. That is, the algorithm will output h = arg min h { h 1 ,..., h k } S cv ( h ) . For this question you will prove the following bound. Let any > 0 be fixed. Then with probability at least 1 , we have that ( h ) min i =1 ,...,k parenleftBigg ( h i ) + radicalBigg 2 (1 ) m log 4 |H i | parenrightBigg + radicalBigg 2 2 m log 4 k (a) Prove that with probability at least 1 2 , for all h i , | ( h i ) S cv ( h i ) | radicalBigg 1 2 m log 4 k . Answer: For each h i , the empirical error on the cross-validation set, ( h i ) represents the average of m random variables with mean ( h i ) , so by the Hoeffding inequality for any h i , P ( | ( h i ) S cv ( h i ) | ) 2exp( 2 2 m ) . As in the class notes, to insure that this holds for all h i , we need to take the union over all k of the h i s. P ( i, s.t. | ( h i ) S cv ( h i ) | ) 2 k exp( 2 2 m ) . CS229 Problem Set #3 Solutions 2 Setting this term equal to / 2 and solving for yields = radicalBigg 1 2 m log 4 k proving the desired bound. (b) Use part (a) to show that with probability 1 2 , ( h ) min i =1 ,...,k ( h i ) + radicalBigg 2 m log 4 k . Answer: Let j = arg min i ( h i ) . Using part (a), with probability at least 1 2 ( h ) S cv ( h ) + radicalBigg 1 2 m log 4 k = min i S cv ( h i ) + radicalBigg 1 2 m log 4 k S cv ( h j ) + radicalBigg 1 2 m log 4 k ( h j ) + 2 radicalBigg 1 2 m log 4 k = min i =1 ,...,k ( h i ) +...

View Full
Document