CS142 Machine Learning Spring 2017 Lecture 14 Instructor Pedro Felzenszwalb Scribes Dan Xiang Tyler Dae Devlin PAC

# CS1420_Lecture_14.pdf - CS142 Machine Learning Spring 2017...

CS142: Machine Learning Spring 2017 Lecture 14 Instructor: Pedro Felzenszwalb Scribes: Dan Xiang, Tyler Dae Devlin PAC learning (continued) The setup Once again, we recall the setup for binary classification. For some feature space X , we want to approximate a target function f : X → {- 1 , +1 } that maps inputs to binary outputs. We assume that there is some distribution D (which need not be known) according to which samples from X are drawn. We choose our estimate h of the target function f from a class of functions known as the hypothesis set H . We make this choice based on labeled examples x 1 , . . . , x n that are sampled i.i.d. according to D . Each label in this training set is assumed to be given by the true target function so that y i = f ( x i ). We now consider two meta-algorithms for learning f . Alogrithm A Take a training set T . Select any hypothesis h ∈ H that is consistent with the training data T . If the number of samples n satisfies n 1 ε ln |H| δ , then we saw in the last lecture that the following bound on the probability of error holds: P (error D ( h ) < ε ) > 1 - δ. When we can establish a bound of the above form, PAC learning is said to have occurred. Last time we went through a specific example where our hypothesis space H was the set of logical disjunctions (see the Lecture 13 notes). Recall that the three problems with this approach to classification are: (1) We could have |H| = , in which case the inequality involving n is meaning- less.

