lecture18-learning-ranking-handouts-6-per

1542 introducontoinformaonretrieval therankingsvm

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: M
training:
want
g(r|d,q)
≤
−1
for
 nonrelevant
documents
and
g(r|d,q)
≥
1
for
 relevant
documents
   SVM
tes)ng:
decide
relevant
iff
g(r|d,q)
≥
0
   Features
are
not
word
presence
features
(how
 would
you
deal
with
query
words
not
in
your
 training
data?)
but
scores
like
the
summed
 (log)
~
of
all
query
terms
   Unbalanced
data
(which
can
result
in
trivial
 always‐say‐nonrelevant
classifiers)
is
dealt
 with
by
undersampling
nonrelevant
 documents
during
training
(just
take
some
at
 random)
 Introduc)on to Informa)on Retrieval Introduc)on to Informa)on Retrieval An
SVM
classifier
for
informa)on
retrieval

 [Nallapa)
2004]
   Experiments:
 An
SVM
classifier
for
informa)on
retrieval

 [Nallapa)
2004]
 Train
\
Test
   4
TREC
data
sets
   Comparisons
with
Lemur,
a
state‐of‐the‐art
open
source
IR
 engine
(LM)
   Linear
kernel
normally
best
or
almost
as
good
as
quadra)c
 kernel,
and
so
used
in
reported
results
   6
features,
all
variants
of
~,
idf,
and
~.idf
scores
 Disk
4‐5
 WT10G
(web)
 0.1785
 0.2503
 0.2666...
View Full Document

This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online