Lecture 02
Voting classiers, training error of boosting.
18.465
In this lecture we consider the classication problem, i.e. Y = cfw_1, +1.
Consider a family of weak classiers
H = cfw_h : X cfw_1, +1.
Let the empirical minimizer be
n
h0 = argmin
1
I (h(Xi )
Lecture 12
VC subgraph classes of functions. Packing and covering numbers.
18.465
VC-subgraph classes of functions
Let F = cfw_f : X R and
Cf = cfw_(x, t) X R : 0 t f (x) or f (x) t 0.
Dene class of sets C = cfw_Cf : f F .
Denition 12.1. If C is a VC clas
Lecture 15
More symmetrization. Generalized VC inequality.
18.465
Lemma 15.1. Let , - random variables. Assume that
P ( t) et
where 1, t 0, and > 0. Furthermore, for all a > 0 assume that
E( ) E( )
where (x) = (x a)+ . Then
P ( t) e et .
(x-a)+
a
Proof. S
Lecture 14
Kolmogorovs chaining method. Dudleys entropy integral.
For f F [1, 1]n , dene R(f ) =
1
n
n
i=1 i fi .
Let d(f, g ) :=
1 n
n
i=1 (fi
gi )2
1/2
18.465
.
Theorem 14.1.
29 / 2
P f F, R(f )
n
d(0,f )
log
1/2
D(F, , d)d + 2
7/2
0
u
d(0, f )
1 eu
Lecture 13
Covering numbers of the VC subgraph classes.
18.465
Theorem 13.1. Assume F is a VC-subgraph class and V C (F ) = V . Suppose 1 f (x) 1 for all f F
n
1
and x X . Let x1 , . . . , xn X and dene d(f, g ) = n i=1 |f (xi ) g (xi )|. Then
D(F , , d)
Lecture 10
Symmetrization. Pessimistic VC inequality.
18.465
We are interested in bounding
n
1
I (Xi C ) P (C ) t
P sup
C C n i=1
In Lecture 7 we hinted at Symmetrization as a way to deal with the unknown P (C ).
2
Lemma 10.1. [Symmetrization] If t n ,
Lecture 09
Properties of VC classes of sets.
18.465
Recall the denition of VC-dimension. Consider some examples:
C = cfw_(, a) and (a, ) : a R. V C (C ) = 2.
C = cfw_(a, b) (c, d). V C (C ) = 4.
d
f1 , . . . , fd : X R, C = cfw_x : k=1 k fk (x) > 0 : 1
Lecture 08
Vapnik-Chervonenkis classes of sets.
Assume f F = cfw_f : X R and x1 , . . . , xn are i.i.d. Denote Pn f =
n
1
We are interested in bounding n i=1 f (xi ) Ef .
1
n
18.465
n
i=1
f (xi ) and Pf =
f dP = Ef .
Worst-case scenario is the value
su
Lecture 07
Hoeding, Hoeding-Cherno, and Khinchine inequalities.
18.465
Let a1 , . . . , an R and let 1 , . . . , n be i.i.d. Rademacher random variables: P (i = 1) = P (i = 1) = 0.5.
Theorem 7.1. [Hoeding] For t 0,
n
t2
P
i ai t exp n
.
2 i=1 a2
i
i=1
Pro
Lecture 11
Optimistic VC inequality.
18.465
Last time we proved the Pessimistic VC inequality:
V
n
1
nt2
2en
I (Xi C ) P (C ) t 4
P sup
e 8 ,
V
C n i=1
which can be rewritten with
t=
8
n
2en
log 4 + V log
+u
V
as
n
1
8
2en
P sup
I (Xi C ) P (C )
lo
Lecture 06
Bernsteins inequality.
18.465
Last time we proved Bennetts inequality: EX = 0, EX 2 = 2 , |X | < M = const, X1 , , Xn independent
copies of X , and t 0. Then
P
n
Xi t
i=1
n 2
tM
exp 2
,
n 2
M
where (x) = (1 + x) log(1 + x) x.
If X is small, (
Lecture 05
One dimensional concentration inequalities. Bennetts inequality.
For a xed f F , if we observe
1
n
n
i=1
18.465
I (f (Xi ) = Yi ) is small, can we say that P (f (X ) = Y ) is small? By
the Law of Large Numbers,
n
1
I (f (Xi ) = Yi ) EI (f (X )
Lecture 04
Generalization error of SVM.
18.465
Assume we have samples z1 = (x1 , y1 ), . . . , zn = (xn , yn ) as well as a new sample zn+1 . The classier trained
on the data z1 , . . . , zn is fz1 ,.,zn .
The error of this classier is
Error(z1 , . . . ,
Lecture 03
Support vector machines (SVM).
18.465
As in the previous lecture, consider the classication setting. Let X = Rd , Y = cfw_+1, 1, and
H = cfw_x + b, Rd , b R
where | | = 1.
We would like to maximize over the choice of hyperplanes the minimal dis
Lecture 16
Consequences of the generalized VC inequality.
18.465
In Lecture 15, we proved the following Generalized VC inequality
n
29 / 2
1
f (xi ) Ex
P f F , Ef
n i=1
n
d(f, g ) =
d(0,f )
log
1/2
7/2
D(F , , d)d + 2
0
n
1
2
(f (xi ) f (x ) g (xi ) + g