problemset3

# problemset3 - CS229 Problem Set#3 1 CS 229 Public Course...

This preview shows pages 1–2. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS229 Problem Set #3 1 CS 229, Public Course Problem Set #3: Learning Theory and Unsuper- vised Learning 1. Uniform convergence and Model Selection In this problem, we will prove a bound on the error of a simple model selection procedure. Let there be a binary classification problem with labels y ∈ { , 1 } , and let H 1 ⊆ H 2 ⊆ . . . ⊆ H k be k different finite hypothesis classes ( |H i | < ∞ ). Given a dataset S of m iid training examples, we will divide it into a training set S train consisting of the first (1 − β ) m examples, and a hold-out cross validation set S cv consisting of the remaining βm examples. Here, β ∈ (0 , 1). Let ˆ h i = arg min h ∈H i ˆ ε S train ( h ) be the hypothesis in H i with the lowest training error (on S train ). Thus, ˆ h i would be the hypothesis returned by training (with empirical risk minimization) using hypothesis class H i and dataset S train . Also let h ⋆ i = arg min h ∈H i ε ( h ) be the hypothesis in H i with the lowest generalization error. Suppose that our algorithm first finds all the ˆ h i ’s using empirical risk minimization then uses the hold-out cross validation set to select a hypothesis from this the { ˆ h 1 , . . . , ˆ h k } with minimum training error. That is, the algorithm will output ˆ h = arg min h ∈{ ˆ h 1 ,..., ˆ h k } ˆ ε S cv ( h ) . For this question you will prove the following bound. Let any δ > 0 be fixed. Then with probability at least 1 − δ , we have that ε ( ˆ h ) ≤ min i =1 ,...,k parenleftBigg ε ( h * i ) + radicalBigg 2 (1 − β ) m log 4 |H i | δ parenrightBigg + radicalBigg 2 2 βm log 4 k δ (a) Prove that with probability at least 1 − δ 2 , for all ˆ h i , | ε ( ˆ h i ) − ˆ ε S cv ( ˆ h i ) | ≤ radicalBigg 1 2 βm log 4 k δ . (b) Use part (a) to show that with probability 1 − δ 2 , ε ( ˆ h ) ≤ min i =1 ,...,k ε ( ˆ h i ) + radicalBigg 2 βm log 4 k δ . (c) Let j = arg min i ε ( ˆ h i ). We know from class that for H j , with probability 1 − δ 2 | ε ( ˆ h j ) − ˆ ε S train ( h ⋆ j ) | ≤ radicalBigg 2 (1 − β ) m log 4 |H j | δ , ∀ h j ∈ H j ....
View Full Document

## This note was uploaded on 01/24/2010 for the course CS 229 at Stanford.

### Page1 / 4

problemset3 - CS229 Problem Set#3 1 CS 229 Public Course...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online