Lecture 1 Notes: Decision Trees
In this lecture we consider the classification problem, i.e. Y = cfw_1, +1.
Consider a family of weak classifiers
H = cfw_h : X cfw_1, +1.
Let the empirical minimizer be
1
n
i
n
h0= argmin
I(h(Xi)= Yi)
=1
and assume its exp
Lecture 2 Notes: Maximizing Margins
d
As in the previous lecture, consider the classification setting. Let X = R , Y = cfw_+1,1, and
d
H = cfw_x + b, R , b R
where | = 1.
We would like to maximize over the choice of hyperplanes the minimal distance from t
Lecture 4 Notes: Concentrations
n
1
For a fixed f F, if we observe n
the Law of Large Numbers,
1
) is small, can we say that P (f(X) = Y ) is small? By
i=1 I (f(Xi)= Yi
n
i
n
=1
I (f(Xi)= Yi)EI(f(X)= Y ) =P(f(X)= Y ) .
The Central Limit Theorem says
n
n
1
Lecture 5 Notes: Bennetts Inequality
2
2
Last time we proved Bennetts inequality: EX = 0, EX = , |X| < M = const, X1, , Xn independent copies of
X, and t 0. Then
P
n
Xi t
2
exp
M
i=1
tM
n
2
2
n
,
where (x) = (1 + x) log(1 + x) x.
2
x
2
2
2
x
) x = x + x
Lecture 3 Notes: SVM Errors
Assume we have samples z1 = (x1, y1), . . . , zn = (xn, yn) as well as a new sample zn+1. The classifier trained
on the data z1, . . . , zn is fz1,.,zn .
The error of this classifier is
Error(z , . . . , z ) =E
1
n
I(f
zn+1
(x
Lecture 6 Notes: KVL-Divergence
Let a1, . . . , anR and let 1, . . . , n be i.i.d. Rademacher random variables: P (i = 1) = P (i = 1) = 0.5.
Theorem 7.1. [Hoeding] Fort0,
n
2
2 i=1 ai
2
i=1
t
P
iai t
exp
.
n
Proof. Similarly to the proof of Bennetts inequ
Lecture 10 Notes: Random Variables
Lemma 15.1. Let , - random variables. Assume that
P (t)e
t
where 1, t0, and >0. Furthermore, for all a >0 assume that
E() E()
where (x) = (xa)+. Then
P (t) e e
t
.
(x-a)+
a
Proof. Since(x) = (xa)+, we have()(t) whenevert
Lecture 8 Notes: VC Dimensions
Recall the definition of VC-dimension. Consider some examples:
C = cfw_(, a) and (a,) : a R. V C(C) = 2.
C = cfw_(a, b) (c, d). V C(C) = 4.
f1, . . . , fd:X R,C=cfw_x :
d
k=1 kfk(x)>0: 1, . . . , dR
Theorem 9.1. V C(C) in
Lecture 9 Notes: Bounding
We are interested in bounding
sup 1
P
C
n
C
n
I(X
C)
i
(C)
P
t
i=1
In Lecture 7 we hinted at Symmetrization
as a way to deal with the
unknown P (C).
Lemma 10.1. [Symmetrization] If t
n
1
2
, then
1
n
sup
P
I(X
(C)
t
2
P
i
n
C
C)
Lecture 7 Notes: VC Classes
Assume f F = cfw_f : XR and x1, . . . , xn are i.i.d. Denote Pnf = n
We are interested in bounding
n
1
n
1
n
i=1
i
Ef.
f(xi)
fdP = Ef.
f(x ) andPf=
i=1
Worst-case scenario is the value
sup |PnfPf|.
f F
The Glivenko-Cantelli pro