lecture14-SVMs-handout-6-per

Ir commonlythesecondmethodismoresuccessfulit

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 998) 29
 53   In
a
perfect
classifica)on,
only
the
diagonal
has
non‐zero
 entries
 30
 5 Introduc)on to Informa)on Retrieval Sec. 15.3 Introduc)on to Informa)on Retrieval The
Real
World
 The
Real
World
 P.
Jackson
and
I.
Moulinier.
2002.
Natural Language Processing for Online Applica)ons Sec. 15.3.1   Gee,
I’m
building
a
text
classifier
for
real,
now!
   What
should
I
do?
   “There
is
no
ques)on
concerning
the
commercial
value
of
being
 able
to
classify
documents
automa)cally
by
content.
There
are
 myriad
poten)al
applica)ons
of
such
a
capability
for
corporate
 Intranets,
government
departments,
and
Internet
publishers”
   How
much
training
data
do
you
have?
   “Understanding
the
data
is
one
of
the
keys
to
successful
 categoriza)on,
yet
this
is
an
area
in
which
most
categoriza)on
tool
 vendors
are
extremely
weak.
Many
of
the
‘one
size
fits
all’
tools
on
 the
market
have
not
been
tested
on
a
wide
range
of
content
types.”
         None
 Very
lirle
 Quite
a
lot
 A
huge
amount
and
its
growing
 31
 Introduc)on to Informa)on Retrieval Sec. 15.3.1 32
 Introduc)on to Informa)on Retrieval Sec. 15.3.1 Manually
wriren
rules
 Very
lirle
data?
   No
training
data,
adequate
editorial
staff?
   Never
forget
the
hand‐wriren
rules
solu)on!
   If
you’re
just
doing
supervised
classifica)on,
you
 should
s)ck
to
something
with
high
bias
   There
are
theore)cal
results
that
Naïve
Bayes
should
do
 well
in
such
circumstances
(Ng
and
Jordan
2002
NIPS)
   If
(wheat
or
grain)
and
not
(whole
or
bread)
then
   Categorize
...
View Full Document

This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online