This preview shows pages 1–5. Sign up to view the full content.
1
© Eric Xing @ CMU, 20062008
1
Machine Learning
Machine Learning
10
10
701/15
701/15
781, Fall 2008
781, Fall 2008
Computational Learning Theory
Computational Learning Theory
Eric Xing
Eric Xing
Lecture 10, October 8, 2008
Reading: Chap. 7 T.M book
© Eric Xing @ CMU, 20062008
2
Generalizability of Learning
z
In machine learning it's really generalization error that we care
about, but most learning algorithms fit their models to the
training set.
z
Why should doing well on the training set tell us anything
about generalization error? Specifically, can we relate error on
to training set to generalization error?
z
Are there conditions under which we can actually prove that
learning algorithms will work well?
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document 2
© Eric Xing @ CMU, 20062008
3
Complexity of Learning
z
The complexity of leaning is measured mainly along
two axis
: Information
Information
and
computation
.
The
Information complexity
is concerned with the
generalization performance of learning;
z
How many training examples are needed?
z
How fast do learner’s estimate converge to the true population parameters? etc.
The
Computational complexity
concerns the computation
resources applied to the training data to extract from it
learner’s predictions.
It seems that when an algorithm improves with respect to one of
these measures it deteriorates with respect to the other.
© Eric Xing @ CMU, 20062008
4
What General Laws
constrain Inductive Learning?
T
h
e
s
r
u
lt
o
n
ly
f
l
w
t
O
(
…
)
!
z
Sample Complexity
z
How many training examples are sufficient
to learn target concept?
z
Computational Complexity
z
Resources required to learn target concept?
z
Want theory to relate:
z
Training examples
z
Quantity
z
Quality
m
z
How presented
z
Complexity of hypothesis/concept space
H
z
Accuracy of approx to target concept
ε
z
Probability of successful learning
δ
3
© Eric Xing @ CMU, 20062008
5
Prototypical concept learning
task
Binary classification
z
Everything we'll say here generalizes to other, including regression and multi
class classification, problems.
z
Given:
z
Instances
X
: Possible days, each described by the attributes
Sky, AirTemp,
Humidity, Wind, Water, Forecast
z
Target function
c
: EnjoySport
: X
→
{0, 1}
z
Hypotheses space
H
: Conjunctions of literals. E.g.
(?, Cold, High, ?, ?, EnjoySport)
.
z
Training examples
S
: iid positive and negative examples of the target function
(x
1,
c(x
1
)), .
.. (x
m
, c(x
m
))
z
Determine:
z
A hypothesis
h
in
H
such that
h(x)
is "good" w.r.t
c(x)
for all
x
in
S
?
z
A hypothesis
h
in
H
such that
h(x)
is "good" w.r.t
c(x)
for all
x
in the true dist
D
?
© Eric Xing @ CMU, 20062008
6
Sample labels are
consistent
with some
h
in
H
Learner’s hypothesis
required to meet
absolute
upper bound
on its error
No prior restriction on
the sample labels
The required upper
bound on the
hypothesis error is
only
relative
(to the
best hypothesis in the
class)
PAC framework
Agnostic framework
Two Basic Competing Models
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document4
© Eric Xing @ CMU, 20062008
7
Sample Complexity
z
How many training examples are sufficient to learn the target
concept?
This is the end of the preview. Sign up
to
access the rest of the document.
This note was uploaded on 01/26/2010 for the course MACHINE LE 10701 taught by Professor Ericp.xing during the Fall '08 term at Carnegie Mellon.
 Fall '08
 EricP.Xing

Click to edit the document details