This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **CS229 Problem Set #3 1 CS 229, Public Course Problem Set #3: Learning Theory and Unsuper- vised Learning 1. Uniform convergence and Model Selection In this problem, we will prove a bound on the error of a simple model selection procedure. Let there be a binary classification problem with labels y ∈ { , 1 } , and let H 1 ⊆ H 2 ⊆ . . . ⊆ H k be k different finite hypothesis classes ( |H i | < ∞ ). Given a dataset S of m iid training examples, we will divide it into a training set S train consisting of the first (1 − β ) m examples, and a hold-out cross validation set S cv consisting of the remaining βm examples. Here, β ∈ (0 , 1). Let ˆ h i = arg min h ∈H i ˆ ε S train ( h ) be the hypothesis in H i with the lowest training error (on S train ). Thus, ˆ h i would be the hypothesis returned by training (with empirical risk minimization) using hypothesis class H i and dataset S train . Also let h ⋆ i = arg min h ∈H i ε ( h ) be the hypothesis in H i with the lowest generalization error. Suppose that our algorithm first finds all the ˆ h i ’s using empirical risk minimization then uses the hold-out cross validation set to select a hypothesis from this the { ˆ h 1 , . . . , ˆ h k } with minimum training error. That is, the algorithm will output ˆ h = arg min h ∈{ ˆ h 1 ,..., ˆ h k } ˆ ε S cv ( h ) . For this question you will prove the following bound. Let any δ > 0 be fixed. Then with probability at least 1 − δ , we have that ε ( ˆ h ) ≤ min i =1 ,...,k parenleftBigg ε ( h * i ) + radicalBigg 2 (1 − β ) m log 4 |H i | δ parenrightBigg + radicalBigg 2 2 βm log 4 k δ (a) Prove that with probability at least 1 − δ 2 , for all ˆ h i , | ε ( ˆ h i ) − ˆ ε S cv ( ˆ h i ) | ≤ radicalBigg 1 2 βm log 4 k δ . (b) Use part (a) to show that with probability 1 − δ 2 , ε ( ˆ h ) ≤ min i =1 ,...,k ε ( ˆ h i ) + radicalBigg 2 βm log 4 k δ . (c) Let j = arg min i ε ( ˆ h i ). We know from class that for H j , with probability 1 − δ 2 | ε ( ˆ h j ) − ˆ ε S train ( h ⋆ j ) | ≤ radicalBigg 2 (1 − β ) m log 4 |H j | δ , ∀ h j ∈ H j ....

View
Full
Document