lecture14-SVMs-handout-6-per

Macroaveragingcomputeperformanceforeach

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 
   General
idea:


the
original
feature
space
can
always
 be
mapped
to
some
higher‐dimensional
feature
 space
where
the
training
set
is
separable:
   The
linear
classifier
relies
on
an
inner
product
between
vectors
K(xi,xj)=xiTxj
   If
every
datapoint
is
mapped
into
high‐dimensional
space
via
some
 transforma)on
Φ:

x
→
φ(x),
the
inner
product
becomes:
 K(xi,xj)=
φ(xi)
Tφ(xj)
   A
kernel func)on
is
some
func)on
that
corresponds
to
an
inner
product
in
 some
expanded
feature
space.
   Example:

 
2‐dimensional
vectors
x=[x1 x2];

let
K(xi,xj)=(1
+
xiTxj)2,
 
Need
to
show
that
K(xi,xj)=
φ(xi)
Tφ(xj):
 Φ: x → φ(x) 
K(xi,xj)=(1
+
xiTxj)2
=
(1
+
xi1xj1
+
xi2xj2) 2
 




=
1+
xi12xj12 + 2
xi1xj1 xi2xj2+ xi22xj22 +
2xi1xj1 + 2xi2xj2 = [1

xi12 √2
xi1xi2 xi22 √2xi1 √2xi2]T
[1

xj12 √2
xj1xj2 xj22 √2xj1 √2xj2]

 






=
φ(xi)
Tφ(xj)



where
φ(x)
=

[1

x12 √2
x1x2 x22 √2x1 √2x2]
 19
 Introduc)on to Informa)on Retrieval Sec. 15.2.3 Kernels
 Sec. 15.2.4 Introduc)on to Informa)on Retrieval Evalua)on:
Classic
Reuters‐21578
Data
Set

 Most
(over)used
data
set
 21578
documents
 9603
training,
3299
test
ar)cles
(ModApte/Lewis
split)
 118
categories
   An
ar)cle
can
be
in
more
than
one
category
   Learn
118
binary
category
dis)nc)ons
   Average
document:
about
90
types,
200
tokens
   Average
number
of
classes
assigned
   1.24
for
docs
with
at
least
one
category
   Only
about
10
out
of
118
categories
are
large
           Why
use
kernels?
   Make
non‐separable...
View Full Document

Ask a homework question - tutors are online