lecture18-learning-ranking-handouts-6-per

Querywordincoloronpage ofimagesonpage

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: industry
         Poor
machine
learning
techniques
 Insufficient
customiza)on
to
IR
problem
 Not
enough
features
for
ML
to
show
value
 The
Web
provided
impetus
with
constantly
evolving
 spam
 Introduc)on to Informa)on Retrieval Why
wasn’t
ML
much
needed?
   Tradi)onal
ranking
func)ons
in
IR
used
a
very
small
 number
of
features,
e.g.,
   Term
frequency
   Inverse
document
frequency
   Document
length
   It
was
easy
to
tune
weigh)ng
coefficients
by
hand
   And
people
did
 1 5/30/11 Introduc)on to Informa)on Retrieval Introduc)on to Informa)on Retrieval Sec.
6.1.2
 Why
is
ML
needed
now
 Simple
example
   Modern
systems
–
especially
on
the
Web
–
use
a
great
 number
of
features:
   Consider
the
presence
of
query
terms
in
the
Title
(T)
 and
the
Body
(B)
of
a
document
   Arbitrary
useful
features
–
not
a
single
unified
model
                     Boolean
indicator
(0/1)
of
whether
the
query
term
occurs
 in
the
Title
(sT)
or
Body
(sB)
 Log
frequency
of
query
word
in
anchor
text?
 Query
word
in
color
on
page?
 #
of
images
on
page?
 #
of
(out)
links
on
page?
 PageRank
of
page?
 URL
length?...
View Full Document

This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online