This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 6.080/6.089 GITCS Feb 5, 2008 Lecture 21 Lecturer: Scott Aaronson Scribe: Scott Aaronson / Chris Granade 1 Recap and Discussion of Previous Lecture Theorem 1 (Valiant) m = O 1 log (  C  /δ ) samples suﬃce for ( , δ )learning. Theorem 2 (Blumer et al.) m = O 1 VCdim ( C ) log 1 samples suﬃce. δ In both cases, the learning algorithm that achives the bound is just “find any hypothesis h compatible with all the sample data, and output it.” You asked great, probing questions last time, about what these theorems really mean. For example, “why can’t I just draw little circles around the ‘yes’ points, and expect that I can therefore predict the future?” It’s unfortunately a bit hidden in the formalism, but what these theorems are “really” saying is that to predict the future, it suﬃces to find a succinct description of the past–a description that takes many fewer bits to write down than the past data itself. Hence the dependence on  C  or VCdim ( C ): the size or dimension of the concept class from which our hypothesis is drawn. We also talked about the computational problem of finding a small hypothesis that agrees with the data. Certainly we can always solve this problem in polynomial time if P = NP . But what if P = NP ? Can we show that “learning is NPhard ”? Here we saw that we need to distinguish two cases: Proper learning problems (where the hypothesis has to have a certain form): Sometimes we can show these are NPhard . Example: Finding a DNF expression that agrees with the data. Improper learning problems (where the hypothesis can be any Boolean circuit): It’s an open problem whether any of these are NPhard . (Incidentally, why do we restrict the hypothesis to be a Boolean circuit? It’s equivalent to saying, we should be able to compute in polynomial time what a given hypothesis predicts.) So, if we can’t show that improper (or “representationindependent”) learning is NPcomplete , what other evidence might there be for its hardness? The teaser from last time: we could try to show that finding a hypothesis that explains past data is at least as hard as breaking some cryptographic code!...
View
Full
Document
This note was uploaded on 12/26/2011 for the course ENGINEERIN 18.400J taught by Professor Prof.scottaaronson during the Spring '11 term at MIT.
 Spring '11
 Prof.ScottAaronson

Click to edit the document details