This preview shows pages 1–6. Sign up to view the full content.
1
© Eric Xing @ CMU, 20062008
1
Machine Learning
Machine Learning
10
10
701/15
701/15
781, Fall 2008
781, Fall 2008
Computational Learning Theory II
Computational Learning Theory II
Eric Xing
Eric Xing
Lecture 11, October 13, 2008
Reading: Chap. 7 T.M book, and outline material
© Eric Xing @ CMU, 20062008
2
Last time: PAC and Agnostic
Learning
z
Finite H, assume target function c
∈
H
z
Suppose we want this to be at most
δ
. Then
m
examples suffice:
z
Finite H, agnostic learning: perhaps c
not
in H
z
Î
z
with probability at least (1
δ
) every h in H satisfies
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document 2
© Eric Xing @ CMU, 20062008
3
What if H is not finite?
z
Can’t use our result for infinite H
z
Need some other measure of complexity for H
– VapnikChervonenkis (VC) dimension!
© Eric Xing @ CMU, 20062008
4
What if H is not finite?
z
Some Informal Derivation
z
Suppose we have an H that is parameterized by d real numbers. Since we are
using a computer to represent real numbers, and IEEE doubleprecision floating
point (double's in C) uses 64 bits to represent a floating point number, this means
that our learning algorithm, assuming we're using doubleprecision floating point,
is parameterized by 64d bits
z
Parameterization
3
© Eric Xing @ CMU, 20062008
5
How do we characterize
“power”?
z
Different machines have different amounts of “power”.
z
Tradeoff between:
z
More power: Can model more complex classifiers but might overfit.
z
Less power: Not going to overfit, but restricted in what it can model
z
How do we characterize the amount of power?
© Eric Xing @ CMU, 20062008
6
Shattering a Set of Instances
z
Definition
: Given a set
S
= {
x
(1)
, … ,
x
(
d
)
}
(no relation to the
training set) of points
x
(
i
)
∈
X, we say that
H
shatters
if
can realize any labeling
on
.
I.e., if for any set of labels
{
y
(1)
, … ,
y
(
d
)
}
, there exists some
h
∈
so that
h
(
x
(
i
)
) =
y
(
i
)
for all
i
= 1, …,
d
.
z
There are
2
L
different ways to separate the sample in two sub
samples (a dichotomy)
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document 4
© Eric Xing @ CMU, 20062008
7
Instance space
X
Three Instances Shattered
© Eric Xing @ CMU, 20062008
8
The VapnikChervonenkis
Dimension
z
Definition
: The
Vapnik
Chervonenkis
Chervonenkis
dimension
,
VC
(
H
), of
hypothesis space
H
defined over instance space
X
is the size
of the
largest finite subset
of
X
shattered by
H
. If arbitrarily
large finite sets of
X
can be shattered by
H
, then
VC
(
H
)
≡∞
.
© Eric Xing @ CMU, 20062008
9
VC dimension: examples
Consider X =
R
, want to learn c: X
Æ
{0,1}
What is VC dimension of
z
Open intervals:
H1: if x>a, then y=1 else y=0
z
Closed intervals:
H2: if a<x<b, then y=1 else y=0
© Eric Xing @ CMU, 20062008
10
VC dimension: examples
Consider X =
R
2
, want to learn c: X
Æ
{0,1}
z
What is VC dimension of lines in a plane?
H= { ( (wx+b)>0
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview. Sign up
to
access the rest of the document.
This note was uploaded on 01/26/2010 for the course MACHINE LE 10701 taught by Professor Ericp.xing during the Fall '08 term at Carnegie Mellon.
 Fall '08
 EricP.Xing

Click to edit the document details