This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **CS229 Problem Set #3 Solutions 1 CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning 1. Uniform convergence and Model Selection In this problem, we will prove a bound on the error of a simple model selection procedure. Let there be a binary classification problem with labels y ∈ { , 1 } , and let H 1 ⊆ H 2 ⊆ . . . ⊆ H k be k different finite hypothesis classes ( |H i | < ∞ ). Given a dataset S of m iid training examples, we will divide it into a training set S train consisting of the first (1 − β ) m examples, and a hold-out cross validation set S cv consisting of the remaining βm examples. Here, β ∈ (0 , 1). Let ˆ h i = arg min h ∈H i ˆ ε S train ( h ) be the hypothesis in H i with the lowest training error (on S train ). Thus, ˆ h i would be the hypothesis returned by training (with empirical risk minimization) using hypothesis class H i and dataset S train . Also let h ⋆ i = arg min h ∈H i ε ( h ) be the hypothesis in H i with the lowest generalization error. Suppose that our algorithm first finds all the ˆ h i ’s using empirical risk minimization then uses the hold-out cross validation set to select a hypothesis from this the { ˆ h 1 , . . . , ˆ h k } with minimum training error. That is, the algorithm will output ˆ h = arg min h ∈{ ˆ h 1 ,..., ˆ h k } ˆ ε S cv ( h ) . For this question you will prove the following bound. Let any δ > 0 be fixed. Then with probability at least 1 − δ , we have that ε ( ˆ h ) ≤ min i =1 ,...,k parenleftBigg ε ( h ∗ i ) + radicalBigg 2 (1 − β ) m log 4 |H i | δ parenrightBigg + radicalBigg 2 2 βm log 4 k δ (a) Prove that with probability at least 1 − δ 2 , for all ˆ h i , | ε ( ˆ h i ) − ˆ ε S cv ( ˆ h i ) | ≤ radicalBigg 1 2 βm log 4 k δ . Answer: For each ˆ h i , the empirical error on the cross-validation set, ˆ ε ( ˆ h i ) represents the average of βm random variables with mean ε ( ˆ h i ) , so by the Hoeffding inequality for any ˆ h i , P ( | ε ( ˆ h i ) − ˆ ε S cv ( ˆ h i ) | ≥ γ ) ≤ 2exp( − 2 γ 2 βm ) . As in the class notes, to insure that this holds for all ˆ h i , we need to take the union over all k of the ˆ h i ’s. P ( ∃ i, s.t. | ε ( ˆ h i ) − ˆ ε S cv ( ˆ h i ) | ≥ γ ) ≤ 2 k exp( − 2 γ 2 βm ) . CS229 Problem Set #3 Solutions 2 Setting this term equal to δ/ 2 and solving for γ yields γ = radicalBigg 1 2 βm log 4 k δ proving the desired bound. (b) Use part (a) to show that with probability 1 − δ 2 , ε ( ˆ h ) ≤ min i =1 ,...,k ε ( ˆ h i ) + radicalBigg 2 βm log 4 k δ . Answer: Let j = arg min i ε ( ˆ h i ) . Using part (a), with probability at least 1 − δ 2 ε ( ˆ h ) ≤ ˆ ε S cv ( ˆ h ) + radicalBigg 1 2 βm log 4 k δ = min i ˆ ε S cv ( ˆ h i ) + radicalBigg 1 2 βm log 4 k δ ≤ ˆ ε S cv ( ˆ h j ) + radicalBigg 1 2 βm log 4 k δ ≤ ε ( ˆ h j ) + 2 radicalBigg 1 2 βm log 4 k δ = min i =1 ,...,k ε ( ˆ h i ) +...

View
Full Document