ps3_solution

ps3_solution - CS229 Problem Set #3 Solutions 1 CS 229,...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS229 Problem Set #3 Solutions 1 CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning 1. Uniform convergence and Model Selection In this problem, we will prove a bound on the error of a simple model selection procedure. Let there be a binary classification problem with labels y { , 1 } , and let H 1 H 2 . . . H k be k different finite hypothesis classes ( |H i | < ). Given a dataset S of m iid training examples, we will divide it into a training set S train consisting of the first (1 ) m examples, and a hold-out cross validation set S cv consisting of the remaining m examples. Here, (0 , 1). Let h i = arg min h H i S train ( h ) be the hypothesis in H i with the lowest training error (on S train ). Thus, h i would be the hypothesis returned by training (with empirical risk minimization) using hypothesis class H i and dataset S train . Also let h i = arg min h H i ( h ) be the hypothesis in H i with the lowest generalization error. Suppose that our algorithm first finds all the h i s using empirical risk minimization then uses the hold-out cross validation set to select a hypothesis from this the { h 1 , . . . , h k } with minimum training error. That is, the algorithm will output h = arg min h { h 1 ,..., h k } S cv ( h ) . For this question you will prove the following bound. Let any > 0 be fixed. Then with probability at least 1 , we have that ( h ) min i =1 ,...,k parenleftBigg ( h i ) + radicalBigg 2 (1 ) m log 4 |H i | parenrightBigg + radicalBigg 2 2 m log 4 k (a) Prove that with probability at least 1 2 , for all h i , | ( h i ) S cv ( h i ) | radicalBigg 1 2 m log 4 k . Answer: For each h i , the empirical error on the cross-validation set, ( h i ) represents the average of m random variables with mean ( h i ) , so by the Hoeffding inequality for any h i , P ( | ( h i ) S cv ( h i ) | ) 2exp( 2 2 m ) . As in the class notes, to insure that this holds for all h i , we need to take the union over all k of the h i s. P ( i, s.t. | ( h i ) S cv ( h i ) | ) 2 k exp( 2 2 m ) . CS229 Problem Set #3 Solutions 2 Setting this term equal to / 2 and solving for yields = radicalBigg 1 2 m log 4 k proving the desired bound. (b) Use part (a) to show that with probability 1 2 , ( h ) min i =1 ,...,k ( h i ) + radicalBigg 2 m log 4 k . Answer: Let j = arg min i ( h i ) . Using part (a), with probability at least 1 2 ( h ) S cv ( h ) + radicalBigg 1 2 m log 4 k = min i S cv ( h i ) + radicalBigg 1 2 m log 4 k S cv ( h j ) + radicalBigg 1 2 m log 4 k ( h j ) + 2 radicalBigg 1 2 m log 4 k = min i =1 ,...,k ( h i ) +...
View Full Document

Page1 / 8

ps3_solution - CS229 Problem Set #3 Solutions 1 CS 229,...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online