Massachusetts
Institute
of
Technology
6.867
Machine
Learning,
Fall
2006
Problem
Set
1
Solutions
Section
B
1.
(a)
The
functions
are
on
the
course
website
hw1/solutions
.
The
major
cause
of
variation
among
the
solutions
was
the
choice
of
the
order
in
which
points
were
picked
for
the
update
step.
This
was
also
the
cause
of
a
few
subtle
mistakes
as
well.
The
most
straightforward
strategy
for
picking
points
(in
the
order
of
their
occurrence
in
the
dataset),
produces
θ
= 2
.
3799
radians
or
θ
=
136
.
3582
◦
.
The
number
of
updates,
as
per
the
above
strategy,
was
10.
The
analysis
described
in
the
lectures
indicated
that
the
number
of
perceptron
updates
(mistakes)
was
necessarily
±
2
bounded
by
R/γ
∗
where
γ
∗
is
the
maximum
geometric
margin
for
this
problem.
geom
geom
A
small
number
of
updates
therefore
suggests
that
γ
∗
is
reasonably
large
in
comparison
geom
to
the
radius
R
of
the
enclosing
sphere.
In
other
words,
the
two
class
populations
appear
to
be
wellseparated.
(b)
The
most
straightforward
strategy
for
picking
points
(in
the
order
of
their
occurrence
in
the
dataset),
produces
θ
= 2
.
3566
radians
or
θ
=
136
.
3582
◦
.
We
will
accept
answers
in
the
range
(2
.
3552
,
2
.
3592)
radians
or
(134
.
9454
◦
,
135
.
1730
◦
).
The
latter
ranges
corresponds
to
decision
boundaries
going
through
points
at
the
margins
of
the
maxmargin
classiFer
(see
Prob
2).
The
number
of
updates,
as
per
the
above
strategy,
was
152.
The
bounding
sphere
is
about
the
same
for
these
points,
however.
would
therefore
expect
that
the
geometric
margin
γ
∗
is
larger
for
this
problem.
geom
(c)
can
also
evaluate
γ
geom
,
the
margin
actually
achieved
by
the
perceptron
algorithm.
This
is
not
the
maximum
margin
but
may
nevertheless
be
indicative
of
how
hard
the
problem
is.
Given
X
and
θ
,
γ
geom
can
be
calculated
in
MATLAB
as
follows:
gamma_geom
=
min(abs(X*theta
/
norm(theta)))
get
γ
a
= 1
.
6405
and
γ
b
= 0
.
0493,
again
with
some
variation
due
to
the
order
geom
geom
in
which
one
selects
the
training
examples.
These
margins
appear
to
be
consistent
with
our
analysis,
at
least
in
terms
of
their
relative
magnitude.
The
bound
on
the
number
of
updates
holds
for
any
margin,
maximum
or
not,
but
gives
the
tightest
guarantee
with
the
maximum
margin.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '06
 TommiJaakkola
 Machine Learning, MIT OpenCourseWare, Massachusetts Institute of Technology, training error

Click to edit the document details