problemset3

problemset3 - CS229 Problem Set #3 1 CS 229, Public Course...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS229 Problem Set #3 1 CS 229, Public Course Problem Set #3: Learning Theory and Unsuper- vised Learning 1. Uniform convergence and Model Selection In this problem, we will prove a bound on the error of a simple model selection procedure. Let there be a binary classification problem with labels y { , 1 } , and let H 1 H 2 . . . H k be k different finite hypothesis classes ( |H i | < ). Given a dataset S of m iid training examples, we will divide it into a training set S train consisting of the first (1 ) m examples, and a hold-out cross validation set S cv consisting of the remaining m examples. Here, (0 , 1). Let h i = arg min h H i S train ( h ) be the hypothesis in H i with the lowest training error (on S train ). Thus, h i would be the hypothesis returned by training (with empirical risk minimization) using hypothesis class H i and dataset S train . Also let h i = arg min h H i ( h ) be the hypothesis in H i with the lowest generalization error. Suppose that our algorithm first finds all the h i s using empirical risk minimization then uses the hold-out cross validation set to select a hypothesis from this the { h 1 , . . . , h k } with minimum training error. That is, the algorithm will output h = arg min h { h 1 ,..., h k } S cv ( h ) . For this question you will prove the following bound. Let any > 0 be fixed. Then with probability at least 1 , we have that ( h ) min i =1 ,...,k parenleftBigg ( h * i ) + radicalBigg 2 (1 ) m log 4 |H i | parenrightBigg + radicalBigg 2 2 m log 4 k (a) Prove that with probability at least 1 2 , for all h i , | ( h i ) S cv ( h i ) | radicalBigg 1 2 m log 4 k . (b) Use part (a) to show that with probability 1 2 , ( h ) min i =1 ,...,k ( h i ) + radicalBigg 2 m log 4 k . (c) Let j = arg min i ( h i ). We know from class that for H j , with probability 1 2 | ( h j ) S train ( h j ) | radicalBigg 2 (1 ) m log 4 |H j | , h j H j ....
View Full Document

Page1 / 4

problemset3 - CS229 Problem Set #3 1 CS 229, Public Course...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online