This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 6.080/6.089 GITCS Feb 5, 2008 Lecture 21 Lecturer: Scott Aaronson Scribe: Scott Aaronson / Chris Granade 1 Recap and Discussion of Previous Lecture Theorem 1 (Valiant) m = O 1 log ( | C | /δ ) samples suﬃce for ( , δ )-learning. Theorem 2 (Blumer et al.) m = O 1 VCdim ( C ) log 1 samples suﬃce. δ In both cases, the learning algorithm that achives the bound is just “find any hypothesis h compatible with all the sample data, and output it.” You asked great, probing questions last time, about what these theorems really mean. For example, “why can’t I just draw little circles around the ‘yes’ points, and expect that I can therefore predict the future?” It’s unfortunately a bit hidden in the formalism, but what these theorems are “really” saying is that to predict the future, it suﬃces to find a succinct description of the past–a description that takes many fewer bits to write down than the past data itself. Hence the dependence on | C | or VCdim ( C ): the size or dimension of the concept class from which our hypothesis is drawn. We also talked about the computational problem of finding a small hypothesis that agrees with the data. Certainly we can always solve this problem in polynomial time if P = NP . But what if P = NP ? Can we show that “learning is NP-hard ”? Here we saw that we need to distinguish two cases: Proper learning problems (where the hypothesis has to have a certain form): Sometimes we can show these are NP-hard . Example: Finding a DNF expression that agrees with the data. Improper learning problems (where the hypothesis can be any Boolean circuit): It’s an open problem whether any of these are NP-hard . (Incidentally, why do we restrict the hypothesis to be a Boolean circuit? It’s equivalent to saying, we should be able to compute in polynomial time what a given hypothesis predicts.) So, if we can’t show that improper (or “representation-independent”) learning is NP-complete , what other evidence might there be for its hardness? The teaser from last time: we could try to show that finding a hypothesis that explains past data is at least as hard as breaking some cryptographic code!...
View Full Document
This note was uploaded on 12/26/2011 for the course ENGINEERIN 18.400J taught by Professor Prof.scottaaronson during the Spring '11 term at MIT.
- Spring '11